## For a Few Equations More

On the occasion of the Gilles Kahn prize award ceremony, I was asked to write an article about my PhD thesis for the popular science blog Binaire from the French newspaper Le Monde (the French version of the article is available here). You can read the English translation below.

Can you tell what the following problems have in common: predict tomorrow’s weather, build crash-resisting cars, scan the bottom of the oceans searching for oil? These are all difficult problems that are too costly to be tackled physically. Importantly, they can also be described by a fundamental tool of mathematics: linear equations. Therefore, the solution of these physical problems can be numerically simulated by solving instead systems of linear equations. A system of linear equations (under matrix form).

You probably remember from math class in high school how tedious solving these systems could get, even when they had a small number of equations. In practice, it is actually quite common to face systems with thousands or even millions or equations. While computers can fortunately solve these systems for us, the computational cost of the solution can become very high for such large numbers of equations.

To respond to this need, great quantities of resources and money have been dedicated to the construction of supercomputers of great computational power, equipped with a large number of computing units called “cores”. For example, while your personal computer is likely to have less than a dozen cores, the most powerful supercomputer in the world has several millions of these cores. Nevertheless, the size of the problems that we must tackle today is so great that even these supercomputers are not sufficient. The Inteprid supercomputer, equipped with 164000 cores © Argonne National Laboratory

To take up this challenge, I have worked during my PhD thesis on new algorithms to solve systems of linear equations whose computational cost is greatly reduced. More precisely, a crucial property of these algorithms is that their cost grows slowly with the number of equations: this is referred to as their complexity. Methods of very low complexity (so-called “hierarchical”) have been proposed since the 2000s. However, these hierarchical methods are quite complex and sophisticated, which makes them unable to attain high performance on supercomputers: that is, their high reduction of the theoretical complexity is translated into only very modest gains in terms of actual computing time.

For this reason, my PhD thesis focused on another method (so-called “Block Low Rank”), that is better suited than hierarchical methods for high performance computing. My first achievement was to compute the complexity of this method, which was previously unknown. I proved that, even if its complexity is slightly higher than than of hierarchical methods, it is still low enough to tackle systems of very large dimensions. In the second part of my thesis, I worked on efficiently implementing this method on supercomputers, so as to translate this theoretical reduction into actual time gains.

By significantly reducing the cost of solving systems of linear equations, this work allowed us to solve several physical problems that were previously too large to be tackled. For instance, it took less than an hour to solve a system of 130 million equations arising in a geophysical application, using a supercomputer equipped with 2400 cores.