Extermination at Scale - Latest Allinea debugger making waves on Titan

Mon April 15, 2013

This article was written by Scott Jones, Science Writer at the National Institute for Computational Sciences, University of Tennessee.

ORNL Green small2When in November 2012 the Oak Ridge Leadership Computing Facility (OLCF) stood up its latest and greatest machine, Titan, it seemed to many that the hard work was done.

Not so. While launching a machine on the scale of Titan, now ranked as the world’s most powerful computer, is certainly an achievement in itself, at the end of the day the hardware is only as good as the software that runs on it.

Because Titan is among the first supercomputing systems to use a hybrid architecture—one that combines traditional central processing units (CPUs) with the graphics processing units (GPUs) common in video game systems—getting scientific applications to scale to all of Titan’s nearly 300,000 compute cores is no small feat. And no matter how solid any one application is, that scaling up is certain to introduce bugs that can greatly hamper its (and Titan’s) productivity.

When an application containing hundreds of thousands of lines of code is running across 300,000 cores, spotting such bugs is a tricky business. It’s a case of sheer numbers and one the OLCF anticipated. It knew it would have to create a revolutionary tool to allow applications to run smoothly after scaling to Titan.

For that reason OLCF staff began working with software developer Allinea on Titan’s previous incarnation, known as Jaguar, and in preparation for Titan’s launch. The product of this relationship is Allinea’s distributed debugging tool, or Allinea DDT, created precisely for the world’s leadership computing systems. With the assistance of OLCF staff, Allinea was able to customize its large-scale debugger to Titan’s hybrid architecture, enabling the supercomputer’s first users to easily scale to large portions of the machine and assisting the OLCF during Titan’s critical acceptance phase.

“Part of the mission of the Titan project is to provide a comprehensive programming ecosystem that allows researchers to be as productive as possible,” said Joshua S. Ladd, the tools project technical officer during the OLCF3 Project. “A major component of that ecosystem is the debugger.”

Currently Allinea representatives are working with Oak Ridge National Laboratory’s (ORNL’s) Application Performance Tools Group to extend Allinea DDT to a scale 40-plus times greater than previous high-end debugging tools, and they are making serious headway.

“Before we joined this project, tools weren’t capable of getting anywhere near the size of the hardware,” noted Allinea’s COO David Lecomber. “The problem was that a debugging tool might do 5,000 or 10,000 parallel tasks if it was lucky, when the machines and applications wanted to write things that could do 200,000 plus. So the tools just got beaten up by the hardware.”

With Allinea DDT, however, the times, they are a-changin’.

What’s in a debugger?

A supercomputer needs a super debugger. Supercomputing applications typically assign each process to a single, separate processor, meaning an application running on 200,000 processors will most likely be executing 200,000 simultaneous processes.

Traditionally a developer will contend with bugs by inserting “print” statements at strategic points in the code. These statements tell the application to display the status of each process at that point in the program’s execution—information such as the value of a variable. By running a test problem, the developer can compare each answer with an expected answer and thereby isolate specific problems in the code.

Each process will respond to the print statement with a one-line answer; thus, an application with thousands of processes will display thousands of lines through which the coder must then sift. This method gets more difficult as the number of processes grows, and it becomes impossible beyond a certain point.

With Allinea DDT, though, developers can quickly pinpoint any failures because it gives them a single view of every process in a parallel job, along with exactly what line of code is being executed. Furthermore, the debugger works with applications written in the most common supercomputing languages: Fortran, C, and C++.

“Allinea DDT is tightly integrated into the Cray programming environment. We worked with Allinea Software to ensure that,” said Ladd. “All you really need to do is load the Allinea DDT module and type ‘ddt’ on the command line to fire up the GUI, and you’re ready to go. And the GUI is just point and click with a mouse.”

With this revolutionary tool, researchers can now more easily focus on their scientific goals without worrying about locating bugs across hundreds of thousands lines of code, and the results are beginning to show.

Accelerating science one bug at a time

With both Titan and Allinea DDT, supercomputing is in uncharted territory.

“The combination of Titan’s size and hybrid architecture with GPUs provides Allinea with a formidable testing ground to grow, develop, and refine the Allinea DDT tool,” said the OLCF’s Hai Ah Nam, a staff scientist that works with researchers to help them get the most out of their time on Titan.

It’s a symbiotic relationship: “During [Titan’s] acceptance, we used Allinea DDT to help us find bugs in our codes. But as early users of Allinea DDT on Titan, we in turn found bugs in Allinea DDT that could be fixed before release to the larger user community,” said Nam. Allinea DDT is really paying off because it represents the only tool with its scaling capacity for Titan’s hybrid CPU/GPU architecture, she said. And Nam should know. She frequently proves Allinea DDT’s value in her role as a scientific computing liaison for the INCITE program.

Her most recent project involved an application, Bigstick, intended to describe the properties of various atomic nuclei of different substances. This basic science research contributes to many fields, including energy and medical research.  Although Nam had a good understanding of the algorithm, she had only a few days to work with the researcher.

“He was having trouble with this ‘Heisenbug’ (Heisenbugs are bugs that mysteriously vanish whenever you try to ‘observe’ them, typically with a ‘printf,’ because you’ve altered latencies between interprocessor communications) that only showed up when he scaled to a certain number of processors,” she said. “And it happened sporadically.”

By the time the researcher got through his first print statement and looked at one part of his code, Nam had figured out the problem with Allinea DDT. “I stunned him by finding his problem so quickly,” she said. “I was able to do it in one sitting, about an hour. I suspect it would have taken him at least a couple of weeks.”

Nam can empathize with scientists who want to focus on their research rather than learning yet another software application. “At small scales I could printf myself out of a problem,” she admits. “But now, with codes running on tens to hundreds of thousands of cores, I have to use Allinea DDT if I want to solve the problem quickly.”

The beginning of a beautiful relationship

“It was a big deal when Allinea Software came here in 2009, and they were able to start Allinea DDT on all of Jaguar’s 225,000 cores,” Ladd said.

Since its inception on Jaguar, the distinguished debugger has helped the OLCF solve some unusual challenges. Ladd and his team used the program to debug an open-source implementation of the Message Passing Interface (MPI) middleware. The work was at a very large scale, a half-million lines of code running on 100,000 to 225,000 cores.

“Even your typical nuclear supernova application is not that size,” said Ladd. “Debugging inside MPI is a vast universe of complexity that touches all aspects of a supercomputer—the network, the CPU, and the memory. All of these factors can conspire to cause problems at scale.

“By having the ability to step through the code, we could identify and resolve issues that I don’t think we would have been able to without Allinea DDT.”

Debugging also gets tricky when code has errors but still runs. To address this problem, Allinea Software is collaborating with VisIt—open-source software used to visualize large scientific data sets. A visual inspection enables researchers to look at a picture of the data, click on different cells, and inspect the process generating the data.

“So let’s say the output is a video of a star exploding,” said Ladd. “As that star explodes, if there are all kinds of weird asymmetries, you probably have some bug in your math. With a visualized debugging tool, if it doesn’t look like you expected, you go through the process to determine if you’ve got a bug in your code, or if you’ve discovered something new.”

Overall, Ladd describes the working relationship with Allinea Software as a gratifying partnership: “I think it has been rewarding for the Allinea Software folks to see their baby running at this scale, and it’s been rewarding for us to have it as a productive contributor in our tools suite . . . Titan is really cutting-edge technology, and it’s even more exciting because it’s not immediately clear what kind of issues users are going to run into when porting their code to the GPUs. To help encourage researchers to use the GPU accelerators, they must have the most powerful and effective tools at their disposal, tools like Allinea DDT. We’re excited for users to run into bugs on the GPUs to see this tool in action.”

By creating the fastest supercomputer with the best “supertools” to support it, the OLCF has created a solid launch pad for breakthrough discoveries.

Just ask ORNL’s Markus Eisenbach. He works with an application known as WL-LSMS, which provides first-principles calculations of properties that are important for the understanding of materials such as steels, iron-nickel alloys, and advanced permanent magnets that will help drive future electric motors and generators. Titan is helping Eisenbach’s team improve the calculations of a material’s magnetic states as they vary by temperature.

However, as with many scientific endeavors, the task is easier said than done, especially when transferring the research from a traditional CPU-based system to Titan’s revolutionary hybrid platform.

Thanks to Allinea DDT, however, the transition has been smoother than previously thought possible. When Eisenbach’s team began its work on Titan, a funny thing happened.

Whenever the team’s application scaled to roughly 14,000 cores, or about two-thirds of the machine, the GPU-based version of WL-LSMS mysteriously crashed. The traditional, CPU-based version, on the other hand, ran fine.

“It was puzzling,” said Eisenbach, noting that the different versions weren’t noticeably different when it came to scaling. Because at such a scale researchers like Eisenbach can’t decipher all of the data in an application’s core file, it was difficult to pinpoint the error in the code.

While Allinea DDT didn’t precisely pinpoint the source of the bug, it did help Eisenbach’s team narrow the range of possibilities to a very small region of the code and “inspired [its] intuition” of what was happening. Turns out the code was running out of Open MP stack space. “Allinea DDT saved a tremendous amount of time,” said Eisenbach.

While stories like Eisenbach’s and Nam’s are indeed a testament to the OLCF/Allinea partnership, they are surely just the beginning. As more and more researchers begin scaling their codes to Titan’s revolutionary architecture, Allinea DDT will have plenty of work to do.

 

This article was published by Oak Ridge Leadership Computing Facility (OLCF) and can be found in full on their website here: https://www.olcf.ornl.gov/2013/02/27/extermination-at-scale/