## Stefan Güttel and Nick Higham selected as Turing Fellows

Stefan Güttel and Nick Higham have been selected as Turing Fellows at The Alan Turing Institute. Their fellowships began this autumn.

Turing Fellows are scholars with proven research excellence in data science, artificial intelligence, or a related field, whose research would be significantly enhanced through active involvement with the Turing network of universities and partners.

## Scaling a Matrix to Exploit Half Precision Arithmetic

In my previous post, I gave a general introduction to the importance of using half precision (fp16) in the solution of linear systems of equations. In this post I will focus on one specific obstacle in using fp16 for general scientific computing: handling very large numbers and very small numbers.

To clarify the meaning of very large and very small it is helpful to draw an analogy with a ruler, which we use it on daily basis to measure length.

The minimum length which one can measure with the ruler shown in the picture (Image credits splashmath)  is 1 mm (millimeter), and this is referred to as the least count of this measuring instrument. The maximum length that can be measured is 10.5 cm (centimeters).  If  we have a pencil whose length falls between 5.6 cm and 5.7 cm, then we decide if it is 5.6 or 5.7 based on to which it is closer to. A similar process also happens in a computer!

Drawing parallels with the example above, we use a similar ruler to measure in a computer, but we measure numbers rather than lengths, and this ruler is called a “floating point system”. Just like a ruler there is a minimum number which the floating point system can measure, and any number less than that is treated as zero. This process of numbers becoming zero in a floating point system because they are very small is called “underflow”. Next, any number which is too large to be measured by a floating point system is made infinity, and this process is called “overflow”. Finally just like a scale, any number is represented by a number closest to it in the floating point format, and this process is called as ‘rounding’.  For a detailed rigorous and accessible introduction to floating point arithmetic, I would refer interested readers to this blog post by Cleve Moler.

There are four standard ways of measuring numbers, and they are called half precision, single precision, double precision, and quadruple precision. They are in the increasing order of maximum number they can represent, and decreasing order of the minimum number they can represent. For the sake of concreteness, lets consider half precision and double precision. The maximum number that can be represented by double precision is 1.80 × 10308, and for half precision is 65500. The maximum number which can be represented in double precision is enough for most of the problems arising in scientific computing. On the other hand 65500 is extremely small! For example, the modulus of elasticity of many metals is in the order of 109, and this is infinity in fp16! Similarly the minimum positive (and normalized) numbers are 1.18 × 10-38 for double precision, and 6.10 x 10-5 for fp16. This limitation in the range of numbers poses a serious limitation for using fp16 in scientific computing.

Summarising, we have a dichotomy between the computational efficiency and the limitation in representing large and small numbers in fp16. To address this, Prof. Nick Higham and I have developed an algorithm that  squeezes a matrix into the range of numbers that can be represented by fp16. We use the well known technique of diagonal scaling to scale all the matrix entries between -1 and +1, and next we multiply the matrix by θ x 65500 to make complete use of the range of numbers in fp16. θ < 1, and is used to avoid overflow in subsequent computation. The scaling algorithm proposed is not restricted to any particular application, but we concentrate on solution of system of linear equations. We employ GMRES-based iterative refinement, which has generated a lot of interest in the scientific computing community. The main contribution of this scaling algorithm is that it greatly expands the class of problems in which fp16 can be used. All the technical details and results of numerical experiments can be found in the following EPrint of the manuscript.

## A New Approach to Probabilistic Rounding Error Analysis

by Nick Higham and Theo Mary

James Wilkinson developed a systematic way of analyzing rounding errors in numerical algorithms in the 1960s—a time when computers typically used double precision arithmetic and a matrix of dimension $n = 100$ was considered large. As a result, an error bound with a dominant term $p(n)u$, for $p$ a low degree polynomial and $u$ the machine precision, was acceptably small.

Today, the supercomputers heading the TOP500 list solve linear systems of equations of dimension $10^8$ and half precision arithmetic (accurate to about 4 significant decimal digits) is increasingly available in hardware, notably on graphical processing units (GPUs) from AMD and NVIDIA. Traditional rounding error bounds cannot guarantee accurate results when the dimension is very large or the precision is low. Yet computations are often successful in such cases—for example in machine learning algorithms making use of half, or even lower, precision.

This discrepancy between theory and practice stems from the fact that traditional rounding error bounds are worst-case bounds and so tend to be pessimistic. Indeed, while a single floating-point operation incurs a rounding error bounded in modulus by $u$, the composition of $n$ operations leads to a worst-case error bound with dominant term proportional to $nu$. But this worst-case error bound is attained only when each rounding error is of maximal magnitude and identical sign, which is very unlikely. Since the beginning of the digital computer era many researchers have modelled rounding errors as random variables in an attempt to obtain better estimates of how the error behaves on average. This line of thinking has led to the well-known rule of thumb, based on informal arguments and assumptions, that constants in rounding error bounds can be replaced by their square roots.

In our EPrint A New Approach to Probabilistic Rounding Error Analysis we make minimal probabilistic assumptions on the rounding errors and make use of a tool from probability theory called a concentration inequality. We show that in several core linear algebra algorithms, including matrix-vector products and LU factorization, the backward error can be bounded with high probability by a relaxed constant proportional to $\sqrt{n\log n}u$ instead of $nu$. Our analysis provides the first rigorous justification of the rule of thumb.

This new bound is illustrated in the figure above, where we consider the solution of a linear system $Ax = b$ by LU factorization. The matrix $A$ and vector $x$ have entries from the random uniform [0,1] distribution and $b$ is formed as $Ax$. We compare the backward error with its traditional worst-case bound and our relaxed probabilistic bound. The figure shows that the probabilistic bound is in very good agreement with the actual backward error and is much smaller than the traditional bound. Moreover, it successfully captures the asymptotic behavior of the error growth, which follows $\sqrt{n}$ rather than $n$.

The assumptions underlying our analysis—that the rounding errors are independent random variables of mean zero—do not always hold, as we illustrate with examples in the paper. Nevertheless, our experiments show that the bounds do correctly predict the backward error for a selection of real-life matrices from the SuiteSparse collection.

## Fast Solution of Linear Systems via GPU Tensor Cores’ FP16 Arithmetic and Iterative Refinement

Over the last 30 years, hierarchical computer memories, multicore processors and graphical processing units (GPUs) have all necessitated the redesign of numerical linear algebra algorithms, and in doing so have led to algorithmic innovations. Mixed precision arithmetic—a concept going back to the earliest computers, which had the ability to accumulate inner products in extra precision—attracted renewed interest in the late 1990s once Intel chips were able to execute single precision at twice the rate of double precision. Now the increasing availability of low precision arithmetic is offering new opportunities.

In the paper Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers presented at SC18 (the leading supercomputing conference), Azzam Haidar, Stanimire Tomov, Jack Dongarra and Nick Higham show how to exploit the half precision (fp16) arithmetic that is now available in hardware. Whereas fp16 arithmetic can be expected to run at twice the rate of fp32 (single precision) arithmetic, the NVIDIA V100 GPU has tensor cores that can execute half precision at up to eight times the speed of single precision and can deliver the results to single precision accuracy. Developing algorithms that can exploit half precision arithmetic is important both for a workstation connected to a single V100 GPU and the world’s fastest computer (as of November 2018): Summit at Oak Ridge National Laboratory, which contains 27,648 V100 GPUs.

The paper shows that a dense n-by-n double precision linear system $Ax = b$ can be solved using mixed precision iterative refinement at a rate up to four times faster than a highly optimized double precision solver and with a reduction in energy consumption by a factor five.

The key idea is to LU factorize the matrix $A$ in a mix of half precision and single precision then apply iterative refinement. The update equations in the refinement process are solved by an inner GMRES iteration that uses the LU factors as preconditioners. This GMRES-IR algorithm was proposed by Carson and Higham in two (open access) papers in SIAM J. Sci. Comput. (2017 and 2018). In the form used here, the algorithm converges for matrices with condition numbers up to about $10^8$. It provides a backward stable, double precision solution while carrying out almost all the flops at lower precision.

Codes implementing this work will be released through the open-source MAGMA library.

## Lecturer, Senior Lecturer or Reader in Applied Mathematics

The School of Mathematics is seeking mathematical scientists of outstanding ability or potential for appointments at Lecturer, Senior Lecturer or Reader level.

Applicants working in any area of applied mathematics are welcome, particularly those working in areas that complement and enhance the applied mathematics research of the School, which spans numerical linear algebra, uncertainty quantification, dynamical systems, data science, mathematics in the life sciences, industrial mathematics, inverse problems and continuum mechanics.

Applicants with research experience in numerical linear algebra, or at the interfaces with pure mathematics or probability and statistics, are strongly encouraged.

These positions provide an excellent opportunity to join the Numerical Linear Algebra Group.

The closing date is January 11, 2019. For the advert and more details see here.

## Welcome to the NLA Group Website

Welcome to the new website of the Numerical Linear Algebra group at the University of Manchester! The group, numbering around 20 and listed here, comprises four permanent faculty—Nick Higham, Françoise Tisseur, Stefan Guettel and Jack Dongarra (part-time)—and PhD students, research associates, research fellows, visitors and other associated researchers.

The website reports our activities, including papers, software and presentations. It contains regularly updated news items as well as blog posts by members of the group on research topics of interest.

We hold weekly group meetings and organize workshops in Manchester and elsewhere. Some of our conference talks are available here.

We give a number of courses in the School of Mathematics in the undergraduate mathematics degree and at M.Sc. level, and also regularly give courses externally, especially at summer schools.

We are always looking to recruit, and current opportunities are listed here. Feel free to contact us if you would like to enquire about future opportunities.

Follow us on Twitter and watch the videos on our YouTube channel.