Introducing AWS Parallel Computing Service for scalable HPC workloads | AWS

NewsIntroducing AWS Parallel Computing Service for scalable HPC workloads | AWS

AWS Unveils AWS Parallel Computing Service: Streamlining High Performance Computing in the Cloud

Amazon Web Services (AWS) has announced the launch of the AWS Parallel Computing Service (AWS PCS), a new managed service designed to simplify the setup and management of high-performance computing (HPC) clusters. This service allows customers to run their simulations at virtually any scale on AWS, leveraging a familiar HPC environment powered by the Slurm scheduler.

Background on AWS HPC Solutions

In November 2018, AWS introduced AWS ParallelCluster, an open-source cluster management tool that helps users deploy and manage HPC clusters in the AWS Cloud. AWS ParallelCluster allows for quick proof-of-concept and production HPC compute environments using command-line interfaces, APIs, Python libraries, and user interfaces installed from open-source packages. However, customers have expressed the need for a fully managed service to eliminate the operational burden of building and managing HPC environments. AWS PCS addresses this demand by providing a comprehensive, managed solution.

Features of AWS Parallel Computing Service

Managed HPC Environments

AWS PCS provides a simplified HPC environment managed by AWS, accessible through the AWS Management Console, AWS SDK, and AWS Command-Line Interface (CLI). System administrators can create managed Slurm clusters that utilize their compute and storage configurations, identity, and job allocation preferences. Slurm, a highly scalable and fault-tolerant job scheduler, is used across a wide range of HPC customers for scheduling and orchestrating simulations.

User Access and Integration

End users, including scientists, researchers, and engineers, can log in to AWS PCS clusters to run and manage HPC jobs, use interactive software on virtual desktops, and access data. This service allows users to quickly port their workloads to AWS PCS with minimal effort.

Remote Visualization and Job Management

AWS PCS supports fully managed NICE DCV remote desktops for remote visualization, providing access to job telemetry or application logs. This feature enables specialists to manage HPC workflows in one place.

Broad Applicability

AWS PCS is designed for a wide range of traditional and emerging compute or data-intensive engineering and scientific workloads. These include computational fluid dynamics, weather modeling, finite element analysis, electronic design automation, and reservoir simulations.

Getting Started with AWS Parallel Computing Service

To get started with AWS PCS, follow the tutorial for creating a simple cluster in the AWS documentation. Users will need to create a virtual private cloud (VPC) using an AWS CloudFormation template and shared storage in Amazon Elastic File System (Amazon EFS) within their account for the AWS Region where AWS PCS will be deployed.

Step-by-Step Guide

  1. Create a Cluster:
    • In the AWS PCS console, choose "Create cluster" to manage resources and run workloads.
    • Enter the cluster name and select the controller size for the Slurm scheduler based on workload limits.
    • Configure networking settings by selecting the created VPC, subnet, and security group.
  2. Create Compute Node Groups:
    • After creating the cluster, define compute node groups, which are virtual collections of Amazon Elastic Compute Cloud (Amazon EC2) instances.
    • Specify common traits such as EC2 instance types, instance count, VPC subnets, Amazon Machine Image (AMI), purchase options, and custom launch configurations.
    • Create an instance profile and EC2 launch template to configure EC2 instances.
  3. Create and Run HPC Jobs:
    • Submit a job to a queue, which remains in the queue until AWS PCS schedules it based on available capacity.
    • Associate queues with compute node groups to provide the necessary EC2 instances for processing.

      Running Jobs with Slurm

      To run a job using Slurm, prepare a submission script specifying the job requirements and submit it to a queue using the sbatch command. This is typically done from a shared directory accessible by login and compute nodes.

      Visualization with NICE DCV

      Users can connect to a fully-managed NICE DCV remote desktop for visualization. For example, the OpenFOAM motorBike simulation can be visualized in a ParaView session after logging into the web interface of the DCV instance.

      Clean Up Resources

      After completing HPC jobs, it is essential to delete the created resources to avoid unnecessary charges.

      Additional Information

      Slurm Versions and Upgrades

      AWS PCS initially supports Slurm 23.11 and offers mechanisms to upgrade Slurm major versions as new versions become available. The service is designed to automatically update the Slurm controller with patch versions.

      Capacity Reservations

      Users can reserve EC2 capacity in a specific Availability Zone for a specific duration using On-Demand Capacity Reservations. This ensures the necessary compute capacity is available when needed.

      Network File Systems

      AWS PCS supports attaching network storage volumes, including Amazon FSx for NetApp ONTAP, Amazon FSx for OpenZFS, Amazon File Cache, Amazon EFS, and Amazon FSx for Lustre. Users can also use self-managed volumes like NFS servers.

      Availability and Feedback

      AWS Parallel Computing Service is now available in several AWS Regions, including US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

      Users are encouraged to try AWS PCS and provide feedback through AWS re:Post or their usual AWS Support contacts.

      Conclusion

      AWS Parallel Computing Service is a significant advancement for organizations that require high-performance computing capabilities. By providing a fully managed service, AWS PCS simplifies the process of setting up and managing HPC clusters, allowing users to focus on their core simulations and computations. With broad applicability and robust features, AWS PCS is poised to become an essential tool for scientists, researchers, and engineers across various industries.

      For more information and to get started, visit the AWS Parallel Computing Service page.

      Special thanks to Matthew Vaughn, a principal developer advocate at AWS, for his contribution to creating an HPC testing environment.

For more Information, Refer to this article.

Neil S
Neil S
Neil is a highly qualified Technical Writer with an M.Sc(IT) degree and an impressive range of IT and Support certifications including MCSE, CCNA, ACA(Adobe Certified Associates), and PG Dip (IT). With over 10 years of hands-on experience as an IT support engineer across Windows, Mac, iOS, and Linux Server platforms, Neil possesses the expertise to create comprehensive and user-friendly documentation that simplifies complex technical concepts for a wide audience.
Watch & Subscribe Our YouTube Channel
YouTube Subscribe Button

Latest From Hawkdive

You May like these Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.