Amazon FSx for Lustre Now Supports Elastic Fabric Adapter and NVIDIA GPUDirect Storage
In an exciting development for those working with high-performance computing (HPC) and machine learning applications, Amazon Web Services (AWS) has announced the integration of Elastic Fabric Adapter (EFA) and NVIDIA GPUDirect Storage (GDS) with Amazon FSx for Lustre. These enhancements significantly boost data throughput and performance, catering to the needs of data-intensive applications.
The Significance of EFA and GDS
To understand the value of these enhancements, let’s delve into what EFA and GDS entail:
- Elastic Fabric Adapter (EFA): This is a network interface designed for Amazon EC2 instances, facilitating high-performance networking capabilities. EFA reduces the latency and increases the bandwidth of inter-node communications, making it ideal for applications that demand fast data exchanges, such as simulations and modeling.
- NVIDIA GPUDirect Storage (GDS): GDS creates a direct data path between storage and GPU memory. By allowing data to bypass the CPU, GDS minimizes data transfer delays and reduces the need for redundant memory copies, thereby optimizing GPU performance.
Performance Improvements
With the integration of EFA and GDS, Amazon FSx for Lustre now offers a dramatically increased throughput—up to 1,500 Gbps per client instance. This is a fifteen-fold improvement over the previous version, which was limited to 100 Gbps using traditional TCP networking. This enhancement is particularly beneficial when utilizing powerful GPU and HPC instances such as Amazon EC2’s P5, Trn1, and Hpc7a.
Applications and Use Cases
This upgrade allows users to handle more demanding workloads with enhanced efficiency. Key applications that stand to benefit include:
- Deep Learning Training: The complex computations in training deep learning models can leverage the enhanced throughput for faster processing and results.
- Drug Discovery: High throughput can accelerate simulations and modeling, crucial for developing new pharmaceuticals.
- Financial Modeling: Financial institutions can perform large-scale simulations and risk assessments with improved speed and accuracy.
- Autonomous Vehicle Development: The processing of massive datasets required for autonomous vehicle testing and development can be expedited.
How It Works
Creating an EFA-Enabled FSx for Lustre File System
- Set Up: Begin by accessing the Amazon FSx console. Choose to create a new file system and select Amazon FSx for Lustre.
- Configuration: Assign a name to your file system. In the deployment section, choose Persistent SSD storage with EFA enabled. Set the throughput per unit of storage to 1000 MB/s/TiB, and define the storage capacity, ensuring a minimum of 4.8 TiB.
- Networking: Use the default virtual private cloud (VPC) and an EFA-enabled security group to manage network settings.
Mounting the File System from an EC2 Instance
- Launch an Instance: In the Amazon EC2 console, launch a new instance, selecting an appropriate instance type, like trn1.32xlarge, and configure network settings to match the FSx Lustre file system’s subnet.
- Network Configuration: Under advanced network settings, ensure that ENA and EFA are selected to enable full network capabilities.
- Connect and Configure: Once the instance is live, connect and follow the steps to install the Lustre client and configure EFA clients. Mount the FSx for Lustre file system using the provided DNS and mount names.
Things to Know
- Availability: EFA and GDS support is available at no additional cost on new FSx for Lustre file systems across all AWS Regions offering persistent 2 storage.
- Compatibility: FSx for Lustre maintains compatibility with both EFA and non-EFA workloads. Non-EFA client instances can access EFA-enabled file systems using standard TCP/IP networking.
- System Requirements: To leverage these features, ensure your client instances run Lustre 2.15 on Ubuntu 22.04 with kernel 6.8 or higher.
- GDS Usage: For GPUDirect Storage, the NVIDIA CUDA package, NVIDIA driver, and the GPUDirect Storage Driver must be installed, all of which are available in the AWS Deep Learning AMI.
Conclusion
The integration of EFA and GDS with Amazon FSx for Lustre underscores AWS’s commitment to providing cutting-edge solutions that meet the growing demands of HPC and AI workloads. By offering unprecedented throughput, AWS enables users to achieve faster, more efficient data processing, thus accelerating innovation and discovery across various sectors.
For detailed setup instructions and additional information, refer to the Amazon FSx for Lustre documentation. Embrace this advancement today and harness the power of enhanced storage performance for your most demanding applications in the cloud.
For more Information, Refer to this article.