Allinea Software Blog

Thu October 20, 2016 by Mark O'Connor

In the previous post we parallelized Andrej Karpathy's policy gradient code to see whether a very simple implementation coupled with supercomputer speeds could learn to play Atari Pong faster than the state-of-the-art (DeepMind's A3C at time of writing) and even train faster in real time than the ultimate in gaming expertise: a 7-year-old child.

DeepMind's paper reports 2 hours training time to defeat Atari Pong using 16 worker threads on a single server. We calculated that to be competitive with that we'd need to accelerate our simple policy gradient code 100x to reach 44,000 steps/s. After some profiling and tuning we achieved 29,000 steps/s on an EC2 cluster...

Wed October 19, 2016 by Mark O'Connor

I’ve always enjoyed playing games, but the buzz from writing programs that play games has repeatedly claimed months of my conscious thought at a time. I’m not sure that writing programs that write programs that play games is the perfect solution, but I do know that I can never resist it.

Wed August 3, 2016 by Mark O'Connor

In episode one we optimized Torch A3C performance on the new Intel Xeon Phi (Knight's Landing) CPU. Allinea MAP and Performance Reports identified bottlenecks in our framework and sped up model training by 7x.

Thu July 21, 2016 by David Lecomber

The Manufacturing industry relies on high performance software for improving the speed and efficiency with which it brings new or improved products to market.

Tue July 5, 2016 by Mark O'Connor

In February, a new paper from Google's DeepMind team appeared on arxiv. This one was interesting – they showed dramatically improved performance and training time of their Atari-playing Deep Q-Learning network. The training speedup was so great that 16 CPU cores outperformed their previous GPU results by an order of magnitude.

Thu April 21, 2016 by Mark O'Connor

Allinea MAP isn't just a lightweight profiler to help you optimize your code. It also lets you add your own metrics with just a couple of lines of code. To show how this works, I'm going to add PAPI's instructions-per-cycle metric to MAP.

Sat March 5, 2016 by Development Team

For Fortran and F90 debugging is - like all languages - inevitable. We look at debugging tips for Fortran and F90 developers to show why and how to use a debugger for some typical bugs.

Do it the right way, not the write way

The F90 and Fortran write (or print) statement for debugging is wired into the brain for many developers - but it just doesn’t do the job. Using write or print is iterative.