Generative AI is reshaping industries worldwide, spurring the need for secure, high-performance solutions that can scale increasingly complex models quickly and cost-effectively. As this demand grows, companies are seeking cutting-edge technology to ensure efficient AI inference.
In a significant announcement at the annual AWS re:Invent conference, Amazon Web Services (AWS) revealed an expanded collaboration with NVIDIA. This development involves the extension of NVIDIA NIM microservices across essential AWS AI services. The goal is to facilitate faster AI inference and reduce latency, particularly for generative AI applications.
NVIDIA NIM microservices are now accessible directly from the AWS Marketplace, alongside Amazon Bedrock Marketplace and Amazon SageMaker JumpStart. This integration simplifies the process for developers who want to deploy NVIDIA-optimized inference for widely used models at scale.
NVIDIA NIM is part of the NVIDIA AI Enterprise software platform, which is also available in the AWS Marketplace. This platform offers developers a collection of user-friendly microservices aimed at the secure and reliable deployment of high-performance, enterprise-grade AI model inference across different environments like clouds, data centers, and workstations.
These prebuilt containers leverage robust inference engines, including the NVIDIA Triton Inference Server, NVIDIA TensorRT, NVIDIA TensorRT-LLM, and PyTorch. They support a wide array of AI models, from those developed by the open-source community to the proprietary NVIDIA AI Foundation models and custom models.
NIM microservices can be deployed across several AWS services, such as Amazon Elastic Compute Cloud (EC2), Amazon Elastic Kubernetes Service (EKS), and Amazon SageMaker. This flexibility allows developers to choose the most suitable platform for their specific needs.
Developers can explore a wide variety of NIM microservices built from popular models and model families. These include Meta’s Llama 3, Mistral AI’s Mistral and Mixtral, NVIDIA’s Nemotron, Stability AI’s SDXL, among others, which are featured on the NVIDIA API catalog. The most commonly utilized microservices are available for self-hosting on AWS services, optimized to run on NVIDIA accelerated computing instances.
The newly available NIM microservices directly on AWS include:
NVIDIA Nemotron-4: This cutting-edge large language model (LLM) is available in the Amazon Bedrock Marketplace, Amazon SageMaker JumpStart, and AWS Marketplace. It is designed to generate diverse synthetic data that closely resembles real-world data, enhancing the performance and robustness of custom LLMs across various domains.
Llama 3.1 8B-Instruct: Available on AWS Marketplace, this 8-billion-parameter multilingual LLM is pretrained and instruction-tuned for language understanding, reasoning, and text-generation tasks.
Llama 3.1 70B-Instruct: Also accessible on AWS Marketplace, this 70-billion-parameter model is optimized for multilingual dialogue.
Mixtral 8x7B Instruct v0.1: This model, available on AWS Marketplace, is a high-quality sparse mixture of experts model with open weights. It is capable of following instructions, completing requests, and generating creative text formats.
Customers and partners across various industries are leveraging NIM on AWS to accelerate time-to-market, maintain the security and control of their generative AI applications and data, and reduce costs. For instance, SoftServe, an IT consulting and digital services provider, has developed six generative AI solutions fully deployed on AWS, accelerated by NVIDIA NIM and AWS services. These solutions, available on the AWS Marketplace, include SoftServe Gen AI Drug Discovery, SoftServe Gen AI Industrial Assistant, Digital Concierge, Multimodal RAG System, Content Creator, and Speech Recognition Platform.
These solutions are rooted in NVIDIA AI Blueprints, which are comprehensive reference workflows that expedite AI application development and deployment. They feature NVIDIA acceleration libraries, software development kits, and NIM microservices for AI agents, digital twins, and more.
Developers can start deploying NVIDIA NIM microservices on AWS according to their specific requirements. This approach enables developers and enterprises to achieve high-performance AI with NVIDIA-optimized inference containers across various AWS services.
To explore the possibilities, developers can visit the NVIDIA API catalog to try out over 100 different NIM-optimized models. They can also request either a developer license or a 90-day NVIDIA AI Enterprise trial license to begin deploying the microservices on AWS services. Additionally, developers can explore NIM microservices available in the AWS Marketplace, Amazon Bedrock Marketplace, or Amazon SageMaker JumpStart.
For more information regarding software products, refer to the official NVIDIA legal notice.
In conclusion, the collaboration between AWS and NVIDIA is a significant step forward in the realm of generative AI. By integrating NVIDIA NIM microservices into AWS services, developers now have access to powerful tools that can enhance AI application performance and efficiency. This partnership not only benefits developers but also opens up new possibilities for industries seeking to leverage AI for competitive advantage. Whether it’s improving language models or creating innovative AI solutions, the AWS and NVIDIA collaboration brings immense potential for the future of AI technology.
For more Information, Refer to this article.