Following strong customer demand, AWS has expanded the availability of Amazon EC2 Inf1 instances to five new Regions: US East (Ohio), Asia Pacific (Sydney, Tokyo), and Europe (Frankfurt, Ireland). Inf1 instances are powered by AWS Inferentia chips, which Amazon custom-designed to provide you with the lowest cost per inference in the cloud and lower barriers for everyday developers to use machine learning (ML) at scale.
As you scale your use of deep learning across new applications, you may be bound by the high cost of running trained ML models in production. In many cases, up to 90% of the infrastructure spent on developing and running an ML application is on inference, making the need for high-performance, cost-effective ML inference infrastructure critical. Inf1 instances are built from the ground up to support ML inference applications and deliver up to 30% higher throughput and up to 45% lower cost per inference than comparable GPU-based instances. This gives you the performance and cost structure you need to confidently deploy your deep learning models across a broad set of applications.
Customers and Amazon services adopting Inf1 instances
Since the launch of Inf1 instances, a broad spectrum of customers, such as large enterprises and startups, as well as Amazon services, have begun using them to run production workloads. Amazon’s Alexa team is in the process of migrating their Text-To-Speech workload from running on GPUs to Inf1 instances. INGA Technology, a startup focused on advanced text summarization, got started with Inf1 instances quickly and saw immediate gains.
“We quickly ramped up on AWS Inferentia-based Amazon EC2 Inf1 instances and integrated them in our development pipeline,” says Yaroslav Shakula, Chief Business Development Officer at INGA Technologies. “The impact was immediate and significant. The Inf1 instances provide high performance, which enables us to improve the efficiency and effectiveness of our inference model pipelines. Out of the box, we have experienced four times higher throughput, and 30% lower overall pipeline costs compared to our previous GPU-based pipeline.”
SkyWatch provides you with the tools you need to cost-effectively add Earth observation data into your applications. They use deep learning to process hundreds of trillions of pixels of Earth observation data captured from space every day.
“Adopting the new AWS Inferentia-based Inf1 instances using Amazon SageMaker for real-time cloud detection and image quality scoring was quick and easy,” says Adler Santos, Engineering Manager at SkyWatch. “It was all a matter of switching the instance type in our deployment configuration. By switching instance types to AWS Inferentia-based Inf1, we improved performance by 40% and decreased overall costs by 23%. This is a big win. It has enabled us to lower our overall operational costs while continuing to deliver high-quality satellite imagery to our customers, with minimal engineering overhead.”
AWS Neuron SDK performance and support for new ML models
You can deploy your ML models to Inf1 instances using the AWS Neuron SDK, which is integrated with popular ML frameworks such as TensorFlow, PyTorch, and MXNet. Because Neuron is integrated with ML frameworks, you can deploy your existing models to Amazon EC2 Inf1 instances with minimal code changes. This gives you the freedom to maintain hardware portability and take advantage of the latest technologies without being tied to vendor-specific software libraries.
Since its launch, the Neuron SDK has seen dramatic improvement in performance, delivering throughput up to two times higher for image classification models and up to 60% improvement for natural language processing models. The most recent launch of Neuron added support for OpenPose, a model for multi-person keypoint detection, providing 72% lower cost per inference than GPU instances.
The easiest and quickest way to get started with Inf1 instances is via Amazon SageMaker, a fully managed service for building, training, and deploying ML models. If you prefer to manage your own ML application development platforms, you can get started by either launching Inf1 instances with AWS Deep Learning AMIs, which include the Neuron SDK, or use Inf1 instances via Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Container Service (Amazon ECS) for containerized ML applications.
For more information, see Amazon EC2 Inf1 Instances.
About the Author
Michal Skiba is a Senior Product Manager at AWS and passionate about enabling developers to leverage innovative hardware. Over the past ten years he has managed various cloud computing infrastructure products at Silicon Valley companies, large and small.