PEARC20 has ended
Welcome to PEARC20!
PEARC20’s theme is “Catch the Wave.” This year’s theme embodies the spirit of the community’s drive to stay on pace and in front of all the new waves in technology, analytics, and a globally connected and diverse workforce. We look forward to this year’s PEARC20 virtual meeting, where we can share scientific discovery and craft the future infrastructure.

The conference will be held in Pacific Time (PT) and the times listed below are in Pacific Time.

The connection information for all PEARC20 workshops, tutorials, plenaries, track presentations, BOFs, Posters, Visualization Showcase, and other affiliated events, are in the PEARC20 virtual conference platform, Brella. If you have issues joining Brella, please email pearcinfo@googlegroups.com.
Back To Schedule
Wednesday, July 29 • 1:35pm - 3:35pm
Monitoring and Analysis of Power Consumption on HPC clusters using XDMoD 🏆

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

🏆 Phil Andrews Most Transformative Contribution award winner 
🏆 Best Paper in “Advanced research computing environments – systems and system software” Track

As part of the NSF funded XMS project we are developing tools and techniques for the audit and analysis of HPC infrastructure. This includes a suite of tools for the analysis of HPC jobs based on performance metrics collected from compute nodes. Although it may not be salient to the user, the energy consumption of an HPC system is an important part of the cost of maintenance and contributes a substantial fraction of the cost of calculations done with the system. We added support for energy usage analysis to the open-source XDMoD tool chain. This allows HPC centers to provide information directly to HPC stakeholders about the power consumption. This includes providing end users with energy usage information about their jobs as well as providing data to allow HPC center staff to analyze how the energy usage of the system is related to other system parameters. We explain how energy metrics were added to XDMoD and describe the issues we overcame in instrumenting a 1400 node academic HPC cluster. We present an analysis of 14 months of data collected on real jobs on the cluster. We performed a machine learning analysis of the data and show how energy usage is related to other system performance metrics.

Wednesday July 29, 2020 1:35pm - 3:35pm PDT