## Half Precision Arithmetic in Numerical Linear Algebra

In whatever shape or form, solution of a linear system of equation is the workhorse for many applications in scientific computing. Therefore our ability to accurately and efficiently solve these linear systems plays a pivotal role in advancing the boundaries of science and technology. Here, efficiency is measured with respect to time—if we have an algorithm to solve a linear system quickly then we can aspire to solve larger and more difficult problems. In this regard the purpose of this post is to describe one of  the latest developments of numerical linear algebra in which our group is actively involved, specifically in the solution of a system of linear equations.

Metaphorically one can think of the algorithm used for the solution of a linear system as an engine in a motor vehicle and the underlying application as the body of the vehicle built around it. Throughout this post I will use this metaphor to explain the course of development and current trends in the algorithms for the solution of a linear system of equations.

In scientific computing there are two components:  developing an algorithm and implementing it on a computer. If we again draw parallels with the engine of a motor vehicle, algorithm developers do the job of designing the various components of the engine and criteria to check if various parts are working as they should, and people implementing on a computer do the job of manufacturing the parts, assembling them, and making sure the engine is working as expected. Finally the most important component for the engine to work is the fuel, and for mathematical algorithms it is numbers. In a computer these numbers are stored in a specific format, and they are called  floating point numbers.

Until very recently computers were getting faster every year, and computer scientists were devising intelligent ways of using the existing algorithms to solve larger and larger problems in the new computers. One important point to note is that, even though the computers were becoming bigger, the underlying mathematics of the algorithms did not change. Again drawing parallel with the engine analogy, the engine became more powerful and therefore the motor vehicle built around it became bigger, but the basic design of the engine parts and the fuel used remained same. But soon this was about to change!

Traditionally double precision or single precision floating point numbers are used for computation. A double precision number occupies 64 bits of memory and a single precision number occupies 32 bits. Double and single precision numbers carry a lot of informations, but at the same time they create a lot of traffic jam in communication channels! If you think of a communication channel in a computer as a road connecting point A to point B, and since the width of the road is fixed, if we send too many trucks on the road they will cause traffic jam, even though they can carry a lot of goods. Therefore the natural solution is to use smaller vehicles instead of bigger vehicles, but this will drastically reduce the amount goods that one can transport. This exactly was the solution proposed to avoid the jam in communication channels, but at the cost of amount of information that can be transferred. This new floating point format is called half precision where a single half precision number occupies 16 bit of memory. The development of half precision as a floating point format was kick-started by  developments in machine learning, where it was found that, for accurate prediction, machine learning models did not require very accurate representation of the input data. Because of this development in the machine learning community hardware vendors such as NVIDIA and AMD started developing chips that support half precision. The world’s fastest supercomputer, SUMMIT at Oak Ridge National Lab, can perform 3.3 X 1018 operations in one seconds when half precision is used. However when a linear system of equations are solved on the same computer using the High Performance LINPACK benchmark it performs 122.3 X 1015 operations in one second, and note that here double precision floating point format is used. Therefore to capitalise on the recent advances in hardware, we need to exploit half precision in the solution of linear systems of equations.

##### Image credits Oak Ridge National Lab Summit Gallery

Returning back to our engine analogy, using half precision is like changing the fuel. As we all are aware, we cannot use diesel in a petrol engine, so the engine has to be redesigned or, in the context of linear systems, the algorithms have to be redesigned. This has precisely been one of the core research efforts in our group. To conclude we are at a very interesting point in the development of algorithms for the solution of linear systems of equations, where the emerging architectures provide interesting opportunities for numerical analysts to rethink old algorithms.

For recent work relevant in the group see

Further work is in progress and will be reported here soon.