AWS Unveils AWS Parallel Computing Service: Streamlining High Performance Computing in the Cloud

Amazon Web Services (AWS) has announced the launch of the AWS Parallel Computing Service (AWS PCS), a new managed service designed to simplify the setup and management of high-performance computing (HPC) clusters. This service allows customers to run their simulations at virtually any scale on AWS, leveraging a familiar HPC environment powered by the Slurm scheduler.

Background on AWS HPC Solutions

In November 2018, AWS introduced AWS ParallelCluster, an open-source cluster management tool that helps users deploy and manage HPC clusters in the AWS Cloud. AWS ParallelCluster allows for quick proof-of-concept and production HPC compute environments using command-line interfaces, APIs, Python libraries, and user interfaces installed from open-source packages. However, customers have expressed the need for a fully managed service to eliminate the operational burden of building and managing HPC environments. AWS PCS addresses this demand by providing a comprehensive, managed solution.

Features of AWS Parallel Computing Service

Managed HPC Environments

AWS PCS provides a simplified HPC environment managed by AWS, accessible through the AWS Management Console, AWS SDK, and AWS Command-Line Interface (CLI). System administrators can create managed Slurm clusters that utilize their compute and storage configurations, identity, and job allocation preferences. Slurm, a highly scalable and fault-tolerant job scheduler, is used across a wide range of HPC customers for scheduling and orchestrating simulations.

User Access and Integration

End users, including scientists, researchers, and engineers, can log in to AWS PCS clusters to run and manage HPC jobs, use interactive software on virtual desktops, and access data. This service allows users to quickly port their workloads to AWS PCS with minimal effort.

Remote Visualization and Job Management

AWS PCS supports fully managed NICE DCV remote desktops for remote visualization, providing access to job telemetry or application logs. This feature enables specialists to manage HPC workflows in one place.

Broad Applicability

AWS PCS is designed for a wide range of traditional and emerging compute or data-intensive engineering and scientific workloads. These include computational fluid dynamics, weather modeling, finite element analysis, electronic design automation, and reservoir simulations.

Getting Started with AWS Parallel Computing Service

To get started with AWS PCS, follow the tutorial for creating a simple cluster in the AWS documentation. Users will need to create a virtual private cloud (VPC) using an AWS CloudFormation template and shared storage in Amazon Elastic File System (Amazon EFS) within their account for the AWS Region where AWS PCS will be deployed.

Step-by-Step Guide

Create a Cluster:
- In the AWS PCS console, choose "Create cluster" to manage resources and run workloads.
- Enter the cluster name and select the controller size for the Slurm scheduler based on workload limits.
- Configure networking settings by selecting the created VPC, subnet, and security group.
Create Compute Node Groups:
- After creating the cluster, define compute node groups, which are virtual collections of Amazon Elastic Compute Cloud (Amazon EC2) instances.
- Specify common traits such as EC2 instance types, instance count, VPC subnets, Amazon Machine Image (AMI), purchase options, and custom launch configurations.
- Create an instance profile and EC2 launch template to configure EC2 instances.
Create and Run HPC Jobs:
- Submit a job to a queue, which remains in the queue until AWS PCS schedules it based on available capacity.
- Associate queues with compute node groups to provide the necessary EC2 instances for processing.
  Running Jobs with Slurm
  To run a job using Slurm, prepare a submission script specifying the job requirements and submit it to a queue using the sbatch command. This is typically done from a shared directory accessible by login and compute nodes.
  Visualization with NICE DCV
  Users can connect to a fully-managed NICE DCV remote desktop for visualization. For example, the OpenFOAM motorBike simulation can be visualized in a ParaView session after logging into the web interface of the DCV instance.
  Clean Up Resources
  After completing HPC jobs, it is essential to delete the created resources to avoid unnecessary charges.
  Additional Information
  Slurm Versions and Upgrades
  AWS PCS initially supports Slurm 23.11 and offers mechanisms to upgrade Slurm major versions as new versions become available. The service is designed to automatically update the Slurm controller with patch versions.
  Capacity Reservations
  Users can reserve EC2 capacity in a specific Availability Zone for a specific duration using On-Demand Capacity Reservations. This ensures the necessary compute capacity is available when needed.
  Network File Systems
  AWS PCS supports attaching network storage volumes, including Amazon FSx for NetApp ONTAP, Amazon FSx for OpenZFS, Amazon File Cache, Amazon EFS, and Amazon FSx for Lustre. Users can also use self-managed volumes like NFS servers.
  Availability and Feedback
  AWS Parallel Computing Service is now available in several AWS Regions, including US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).
  Users are encouraged to try AWS PCS and provide feedback through AWS re:Post or their usual AWS Support contacts.
  Conclusion
  AWS Parallel Computing Service is a significant advancement for organizations that require high-performance computing capabilities. By providing a fully managed service, AWS PCS simplifies the process of setting up and managing HPC clusters, allowing users to focus on their core simulations and computations. With broad applicability and robust features, AWS PCS is poised to become an essential tool for scientists, researchers, and engineers across various industries.
  For more information and to get started, visit the AWS Parallel Computing Service page.
  —
  Special thanks to Matthew Vaughn, a principal developer advocate at AWS, for his contribution to creating an HPC testing environment.

For more Information, Refer to this article.

Introducing AWS Parallel Computing Service for scalable HPC workloads | AWS

AWS Unveils AWS Parallel Computing Service: Streamlining High Performance Computing in the Cloud

Background on AWS HPC Solutions

Features of AWS Parallel Computing Service

Managed HPC Environments

User Access and Integration

Remote Visualization and Job Management

Broad Applicability

Getting Started with AWS Parallel Computing Service

Step-by-Step Guide

Running Jobs with Slurm

Visualization with NICE DCV

Clean Up Resources

Additional Information

Slurm Versions and Upgrades

Capacity Reservations

Network File Systems

Availability and Feedback

Conclusion

You may also like these:

Latest From Hawkdive

You May like these Related Articles

LEAVE A REPLY Cancel reply