Data Journey Towards Insights

Nov 20, 2024·
Woong Shin
Woong Shin
Michael Ott
Michael Ott
· 1 min read
Abstract

Operational data analytics (ODA) provides unique opportunities to analyze, understand, and optimize operations of HPC systems. Readily available open-source frameworks make the collection of monitoring data from different domains of the HPC system increasingly easy. However, making the data work for HPC operations is not straightforward and HPC sites are duplicating efforts to develop methods and tools to analyze and leverage the data. AI-based analysis methods are appealing, but certainly not the only option.

The 2024 session centered on AI and the evolution of ODA over time. Woong Shin (Oak Ridge National Laboratory) presented on the data journey of OLCF across two generations of HPC systems, sharing lessons learned from operating, observing, and analyzing production systems at scale. The BoF brought together HPC operations practitioners to share ODA use cases, discuss open problems, and provide feedback through interactive discussion.

Event
Location

Georgia World Conference Center

285 Andrew Young International Blvd NW, Atlanta, Georgia 30303

Session Overview

The 2024 ODA BoF explored two intertwined themes: the growing role of AI in operational data analytics, and the evolution of ODA practice across successive generations of HPC systems. Presentations were paired with an interactive Mentimeter discussion, inviting operators, researchers, and users to weigh in on how ODA has changed and where it is heading.

Woong Shin (Oak Ridge National Laboratory) anchored the session with a retrospective on the data journey of OLCF across two generations of HPC systems, drawing out patterns in how observability, data access, and analysis workflows evolved alongside the hardware.

Historical Context

SC24 marked the eighth year of the ODA Birds-of-a-Feather series. The series has tracked the maturation of the field through a sequence of recurring themes:

  • 2019–2021 — Building the ODA software stack and framework.
  • 2022 — Managing the abundance of collected data (“drowning in data”).
  • 2023 — Standardization and sharing of tools and methods across sites.
  • 2024 — Data journey over time and the role of AI in data analysis.

Organizer

The BoF is organized by the EE HPC WG Operational Data Analytics team, a global community of HPC operators, researchers, and tool developers working to advance operational data analytics as a discipline.