This is a guest post by Dustin Potter at EagleDream Technologies. In their own words, “EagleDream Technologies educates, enables, and empowers the world’s greatest companies to use cloud-native technology to transform their business. With extensive experience architecting workloads on the cloud, as well as a full suite of skills in application modernization, data engineering, data lake design, and analytics, EagleDream has built a growing practice in helping businesses redefine what’s possible with technology.”
EagleDream Technologies is a trusted cloud-native transformation company and APN Premier Consulting Partner for businesses using AWS. EagleDream is unique in using its cloud-native software engineering and application modernization expertise to guide you through your journey to the cloud, optimize your operations, and transform how you do business using AWS. Our team of highly trained professionals helps accelerate projects at every stage of the cloud journey. This post shares our experience using Amazon CodeGuru Profiler to help one of our customers optimize their application under tight deadlines.
Our team received a unique opportunity to work with one of the industry’s most disruptive airline technology leaders, who uses their expertise to build custom integrated airline booking, loyalty management, and ecommerce platforms. This customer reached out to our team to help optimize their new application. They already had a few clients using the system, but they recently signed a deal with a major airline that would represent a load increase to their platform five times in size. It was critical that they prepare for this significant increase in activity. The customer was running a traditional three-tier application written in Java that used Amazon Aurora for the data layer. They had already implemented autoscaling for the web servers and database but realized something was wrong when they started running load tests. During the first load test, the web tier expanded to over 80 servers and Aurora reached the max number of read replicas.
Our team knew we had to dive deep and investigate the application code. We had previously used other application profiling tools and realized how invaluable they can be when diagnosing these types of issues. Also, AWS recently announced Amazon CodeGuru and we were eager to try it out. On top of that, the price and ease of setup was a driving factor for us. We had looked at an existing commercial application performance monitoring tool, but it required more invasive changes to utilize. To automate the install of these tools, we would have needed to make changes to the customer’s deployment and infrastructure setup. We had to move quickly with as little disruption to their ongoing feature development as possible, which contributed to our final decision to use CodeGuru.
After we decided on CodeGuru, it was easy to get CodeGuru Profiler installed and start capturing metrics. There are two ways to profile an application. The first is to reference the profiler agent during the start of the application by using the standard
-javaagent parameter. This is useful if the group performing the profiling isn’t the development team, for example in an organization with more traditional development and operation silos. This is easy to set up because all that’s needed is to download the .jar published in the documentation and alter any startup scripts to include the agent and the name of the profiling group to use.
The second way to profile the application is to include the profiler code via a dependency in your build system and instantiate a profiling thread somewhere at the entry point of the program. This option is great if the development team is handling the profiling. For this particular use case, we fell into the second group, so including it in the code was the quickest and easiest approach. We added the library as a Maven dependency and added a single line of application code. After the code was committed, we used the customer’s existing Jenkins setup to deploy the latest build to an integration environment. The final step of the pipeline was to run load tests against the new build. After the tests completed, we had a flame graph that we used to start identifying any issues.
The workflow includes the following steps:
- Developers check in code.
- The check-in triggers a Jenkins job.
- Maven compiles the code.
- Jenkins deploys the artifact to the development environment.
- Load tests run against the newly deployed code.
- CodeGuru Profiler monitors the environment and generates a flame graph and a recommendation report.
The following diagram illustrates the workflow.
Flame graphs group together stack traces and highlight which part of the code consumes the most resources. The following screenshot is a sample flame graph from an AWS demo application for reference.
After CodeGuru generated the flame graphs and recommendations report, we took an iterative approach and tackled the biggest offenders first. The flame graphs provided perceptive guidance for actionable recommendations that it discovers and made it easy to identify which execution paths were taking the longest to complete. By looking at the longest frames first, we identified that the customer faced challenges around thread safety, which was leading to locking issues. To resolve issues collaboratively with the client, we created a Slack channel to review the latest graphs and provide recommendations directly to the developers. After the developers implemented the suggested changes, we deployed a new build and had a corresponding graph in a few minutes.
After just one week, our team successfully alleviated their scaling challenges at the web service layer. When we ran the load tests, we saw expected results of a few servers instead of the more than 80 servers previously. Additionally, because we optimized the code, we reduced the existing application footprint, which saved our customer 30% of compute load.
Cost savings aside, one of the most notable benefits of this project was developer education. With CodeGuru Profiler pinpointing where the bottlenecks were, the developers could recognize inefficient patterns in the code that might lead to severe performance hits down the road. This helped them better understand the features of the language they’re using and armed them with increased efficiency in future development and debugging.
With the web service layer better optimized, our next step is to use CodeGuru and other AWS tools like Performance Insights to tackle the database layer. Even if you aren’t experiencing extreme performance challenges, CodeGuru Profiler can provide valuable insights to the health of your application in any environment, from development all the way to production, with minimal CPU utilization. Integrating these results as part of the SDLC or DevOps process leads to better efficiency and gives you and your developers the tools you need to be successful. To learn more about how to get started with CodeGuru Profiler and CodeGuru Reviewer, check the documentation found here.
About the Author
Dustin Potter is a Principal Cloud Solutions Architect at EagleDream Technologies.