Posts by Month

Subscribe by email

Your email:

Allinea Software blog

Current Articles | RSS Feed RSS Feed

Thirty times as much traffic as Wikipedia - How we reached petascale with Allinea DDT

  
  
  

 In June 2009, Allinea announced a collaboration with the French organization CEA. We agreed to scale Allinea DDT up to debug 32,000 simultaneous cores - at this time this covered 98% of the systems in the TOP500.

Up until then Allinea DDT had used a simple, flat architecture similar to many traditional web servers of the time. Our GUI ran on a frontend node and received connections from each of the compute nodes, processed and visualized the data and sent out new commands.

To reach 32,000 processes it was clear that a change in Allinea DDT's architecture was required - after all, even high-traffic websites were using load-balancers to spread requests amongst a network of servers for processing. We had to do this in reverse - instead of having thousands of users sending requests to a single service, we had thousands of services sending data to a single user.

Allinea DDT already performed a lot of data aggregation before displaying this information to the user - clearly, even at moderate scales of hundreds of processes, you can't look at each one in turn. Way back in Allinea DDT 1.10 we started addressing this by introducing a parallel stack view, which shows you the broad picture of where your processes are - without overwhelming you with the specifics.

To implement this, Allinea DDT collects the stacks from every parallel process, but reduces them down into a manageable amount of information by merging common branches and tracking interesting metrics associated with each. This kind of reduce operation is a classic candidate for parallelization and conveniently enough Allinea DDT always finds itself running on a supercomputer powerful enough to do it in real time...a plan was formed.

Our solution was to make Allinea DDT's daemons running on the compute nodes assemble themselves into a tree, with the GUI talking only to the root node. All data from the nodes is distributed across the tree and aggregated through reduction operations at each level all the way up to the top. By parallelising data processing in this way Allinea DDT is able to scale O(log n) with the number of processes being debugged.

That was the theory, but could we make it work in practice?

 

Allinea DDT comparison with WikipediaYes.

The first version with the new architecture was released after just six months, in December 2009 as Allinea DDT 2.5 and it exceeded all expectations.

Two years later, we released Allinea DDT 3.0 - the result of an intense collaboration with the US DOE's Oak Ridge National Laboratory that took Allinea DDT even further - to 225,000 simultaneous processes: the limit of the largest machine in the world at that time.

Now, with scale-by-default baked into our systems and development process, the sky is the limit. By 2013 we expect to see Allinea DDT running on systems with over 1,000,000 simultaneous processes. Now that will be something special!

Comments

Currently, there are no comments. Be the first to post one!
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics