Once a program has been parallelized, its performance remains usually far from optimal. Too difficult is the process of performance optimization, which needs to consider the complex interplay between the algorithm and the hardware. Many parallel applications also suffer from latent performance limitations that may prevent them from scaling to larger problem or machine sizes. Often, such scalability bugs manifest themselves only when an attempt to scale the code is actually being made – a point where remediation can already be difficult. Performance models allow such issues to be predicted before they become relevant. A performance model is a formula that expresses a performance metric of interest such as execution time or energy consumption as a function of one or more execution parameters such as the size of the input problem or the number of processors. However, deriving such models analytically from the code is so laborious that too many application developers shy away from the effort.
To let a wider audience of developers profit from performance models, we create techniques to learn them automatically from a small set of performance measurements. Our performance-modeling tool Extra-P generates such empirical performance models for each function of even complex applications with hundreds of thousands of lines of code. In this way, the programmer can easily spot scalability problems or identify execution parameters that guarantee the desired degree of efficiency. Extra-P is available for download under an open-source license.