## Wilkinson and Backward Error Analysis

by Sven Hammarling and Nick Higham

It is often thought that Jim Wilkinson developed backward error analysis because of his early involvement in solving systems of linear equations. In his 1970 Turing lecture [5] he described an experience, during world war II at the Armament Research Department, of solving a system of twelve linear equations on a desk computer, using Gaussian elimination. (He doesn’t say how long it took, but it must surely have been several days.) The coefficients were of order unity and, using ten decimal digit computation, he found that the coefficients of the reduced equation determining $x_{12}$ had four leading zeros, so he felt that the solutions could surely have no more than six correct figures. As a check on his calculations, he then computed the residuals and to his surprise the left hand sides agreed with the right hand sides to the full ten figures.

After the war Wilkinson joined the Mathematics Division at the National Physical Laboratory. Soon after his arrival, a system of eighteen equations were given to the Mathematics Division. This required a joint effort, which was manned by Leslie Fox, Eric Goodwin, Alan Turing and Wilkinson. Again, the solution was somewhat ill conditioned, as revealed by the final reduced equation, but again in computing the residuals the right and left hand sides agreed to full accuracy. Incidentally, Wilkinson and his colleagues used iterative refinement, which convinced them that the first solution had been accurate to six figures.

These experiences did not straightforwardly lead Wilkinson to develop backward error analysis. In [7] he says that he first used backward error analysis in connection with simple programs for computing zeros of polynomials soon after PILOT ACE came into use, specifically a program for evaluating a polynomial by nested multiplication and a program for carrying out polynomial deflation. But he nevertheless did not recognise backward error analysis as a general tool. Wilkinson explains

It is natural to ask why I did not immediately set about using this type of error analysis as a general purpose tool. In retrospect it seems amazing that I did not try it on Gaussian elimination and on various eigenvalue algorithms in which I was keenly interested at the time. The truth is that it did not occur to me for one moment to do so.

His explicit recognition of a tool that he decided to call backward error analysis soon came through his experience of solving eigenvalue problems on PILOT ACE. He states further in [7]:

Because of the small storage capacity of the PILOT ACE virtually the only algorithm that could be used for dealing with large unsymmetric eigenvalue problems was the power method supplemented by various techniques for accelerating convergence. After each eigenvalue/eigenvector was determined this pair was removed by deflation. Now at that time deflation was generally held to be extremely unstable and accordingly I used it at first with great trepidation. However, it soon became evident that it was being remarkably effective.

As with the linear equation problem, Wilkinson computed residuals, $r = A\hat{x} - \hat{\lambda} \hat{x}$, where $\hat{x}$ and $\hat{\lambda}$ are the computed values, with $\hat{x}$ normalised so that $\hat{x}^T \hat{x} = 1$, and, even after many deflations, he found that the residuals were remarkably small. He then realised that

$(A - r \hat{x}^T) \hat{x} = \hat{\lambda} \hat{x} \;\;\; \mbox{exactly}.$

and this led him directly to the backward error analysis since, if we put $E = -r \hat{x}^T$, then $\hat{\lambda}$ and $\hat{x}$ are an exact eigenvalue and eigenvector of the matrix $A + E$. He now recognised that the process could be widely used and this, of course, led to his 1963 book Rounding Errors in Algebraic Processes [3] and, soon after, to The Algebraic Eigenvalue Problem [4].

It should be noted that Wilkinson did not claim to be the first to perform a backward error analysis. He attributes the first analysis to von Neumann and Goldstine in their 1947 paper [2], which, as Wilkinson said in his von Neumann prize paper “is not exactly bedtime reading” [6]. Wilkinson also gives great credit to Givens for his backward error analysis of orthogonal tridiagonalisation in his, sadly, unpublished technical report [1].

References

[1]   W. Givens. Numerical computation of the characteristic values of a real symmetric matrix. Technical Report ORNL-1574, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA, 1954.

[2]   J. von Neumann and H. H. Goldstine. Numerical inverting of matrices of high order. Bull. Amer. Math. Soc., 53:1021–1099, 1947.

[3]   J. H. Wilkinson. Rounding Errors in Algebraic Processes. Notes on Applied Science, No.32. HMSO, London, UK, 1963. (Also published by Prentice-Hall, Englewood Cliffs, NJ, USA, 1964. Reprinted by Dover Publications, New York, 1994).

[4]   J. H.Wilkinson. The Algebraic Eigenvalue Problem. Oxford University Press, Oxford, UK, 1965.

[5]   J. H. Wilkinson. Some comments from a numerical analyst. J. ACM, 18:137–147, 1971. (The 1970 A. M. Turing lecture).

[6]   J. H. Wilkinson. Modern error analysis. SIAM Review, 13:548–568, 1971. (The 1970 von Neumann lecture).

[7]   J. H. Wilkinson. The state of the art in error analysis. NAG Newsletter, 2/85:5–28, 1985. (Invited lecture for the NAG 1984 Annual General Meeting).

## Our Alumni – Craig Lucas

In this blog post, we asked one of our alumni, Craig Lucas, a few questions about his time with the Numerical Linear Algebra Group.

Please can you introduce yourself and tell us a bit about your experience before attending University of Manchester?

I came to study Mathematics a little later than usual. I was a technician civil engineer working in land reclamation in Staffordshire and needed a change! I was always told I was good at maths and thought at 27 I should get a degree. I am very grateful to Graham Bowtell  at City University who took a chance on someone without A-levels. I developed an interest in Numerical Analysis and computing and wanted to take my study as far as I could. That brought me to Manchester for an MSc, and ultimately a PhD.

What was your PhD thesis on?

My thesis, supervised by Nick Higham, was “Algorithms for Cholesky and QR Factorizations, and the Semidefinite Generalized Eigenvalue Problem.” Arguably a rag bag of algorithms building on my MSc experience of symmetric matrices. I also met and worked with Sven Hammarling on QR updating. He then worked for NAG, as I do now.

Why did you choose to study your PhD in Manchester?

During my MSc I realised I was working with world leaders in their field. It wasn’t a difficult decision to stay on for a PhD, in fact, I felt incredibly lucky to have that opportunity.

How did you find Manchester?

I hated it! I had come up from London and it felt that a whole new world. I wasn’t used to strangers talking to me in the street! However, after about 18 months the place really started to grow on me, and now, nearly 20 years later, I can’t imagine living anywhere else. We have an incredible arts scene, fantastic restaurants, brilliant transport links and a cost of living that makes living back in London seem ridiculous.

Can you tell us about your careers since leaving Manchester?

Firstly I never really left. In the 15 years since I finished my PhD I have taught on my old MSc, supervised students and several KTP projects jointly with the Numerical Linear Algebra group. After my PhD, I went to work in research computing at Manchester first, in high performance computing (HPC.) Then just over 10 years ago I joined NAG where I could use both my numerical analysis and HPC skills.

What is your current role?

I run NAG’s Manchester Office, which is a rather nice penthouse on Portland Street with a roof terrace, and the HPC team here. I am supervising my third KTP, involved in running NAG’s contribution to the EU POP project and every now and then write some mathematical software.

## Our Alumni – Lijing Lin

In this blog post, we asked one of our alumni, Lijing Lin, a few questions about her time with the Numerical Linear Algebra Group.

Please can you introduce yourself and tell us a bit about your experience before attending University of Manchester?

I obtained my BSc from Nanjing University of Aeronautics and Astronautics and MSc from Fudan University in China, before coming to Manchester to study for my PhD in 2007.

What was your PhD thesis on?

The title of my thesis is Roots of Stochastic Matrices and Fractional Matrix Powers. Computing roots of stochastic matrices arises from Markov chain models in finance and healthcare where a transition over a certain time interval is needed but only a transition over a longer time interval may be available. Besides developing new theories, we also developed a package for computing stochastic roots. Fractional matrix powers are more general functions than matrix roots. We developed a new algorithm for computing arbitrary real powers of matrices.

Why did you choose to study your PhD in Manchester?

I had developed an interest in doing research in Numerical Linear Algebra during my MSc. The NLA group in Manchester is renowned for world-leading expertise in this area, and is one of the best places in the world to study and do research.

How did you find Manchester?

I have studied, worked and lived in Manchester for over 11 years now. It is exciting, diverse and welcoming–a city that keeps growing and never stops surprising me.

Can you tell us about your careers since leaving Manchester?

After graduating, I continued working in Manchester as a Research Associate. With a solid background in NLA, my research now has moved toward machine learning, probabilistic modelling, and statistics.

What is your current role?

I am currently a Turing PDRA in predictive healthcare. We are building prognostic models that allow consideration of “what if” scenarios to explore the effects of interventions, e.g. how would a person’s risk of getting heart attack change if he started or quit smoking now.

## Half Precision Arithmetic in Numerical Linear Algebra

In whatever shape or form, solution of a linear system of equation is the workhorse for many applications in scientific computing. Therefore our ability to accurately and efficiently solve these linear systems plays a pivotal role in advancing the boundaries of science and technology. Here, efficiency is measured with respect to time—if we have an algorithm to solve a linear system quickly then we can aspire to solve larger and more difficult problems. In this regard the purpose of this post is to describe one of  the latest developments of numerical linear algebra in which our group is actively involved, specifically in the solution of a system of linear equations.

Metaphorically one can think of the algorithm used for the solution of a linear system as an engine in a motor vehicle and the underlying application as the body of the vehicle built around it. Throughout this post I will use this metaphor to explain the course of development and current trends in the algorithms for the solution of a linear system of equations.

In scientific computing there are two components:  developing an algorithm and implementing it on a computer. If we again draw parallels with the engine of a motor vehicle, algorithm developers do the job of designing the various components of the engine and criteria to check if various parts are working as they should, and people implementing on a computer do the job of manufacturing the parts, assembling them, and making sure the engine is working as expected. Finally the most important component for the engine to work is the fuel, and for mathematical algorithms it is numbers. In a computer these numbers are stored in a specific format, and they are called  floating point numbers.

Until very recently computers were getting faster every year, and computer scientists were devising intelligent ways of using the existing algorithms to solve larger and larger problems in the new computers. One important point to note is that, even though the computers were becoming bigger, the underlying mathematics of the algorithms did not change. Again drawing parallel with the engine analogy, the engine became more powerful and therefore the motor vehicle built around it became bigger, but the basic design of the engine parts and the fuel used remained same. But soon this was about to change!

Traditionally double precision or single precision floating point numbers are used for computation. A double precision number occupies 64 bits of memory and a single precision number occupies 32 bits. Double and single precision numbers carry a lot of informations, but at the same time they create a lot of traffic jam in communication channels! If you think of a communication channel in a computer as a road connecting point A to point B, and since the width of the road is fixed, if we send too many trucks on the road they will cause traffic jam, even though they can carry a lot of goods. Therefore the natural solution is to use smaller vehicles instead of bigger vehicles, but this will drastically reduce the amount goods that one can transport. This exactly was the solution proposed to avoid the jam in communication channels, but at the cost of amount of information that can be transferred. This new floating point format is called half precision where a single half precision number occupies 16 bit of memory. The development of half precision as a floating point format was kick-started by  developments in machine learning, where it was found that, for accurate prediction, machine learning models did not require very accurate representation of the input data. Because of this development in the machine learning community hardware vendors such as NVIDIA and AMD started developing chips that support half precision. The world’s fastest supercomputer, SUMMIT at Oak Ridge National Lab, can perform 3.3 X 1018 operations in one seconds when half precision is used. However when a linear system of equations are solved on the same computer using the High Performance LINPACK benchmark it performs 122.3 X 1015 operations in one second, and note that here double precision floating point format is used. Therefore to capitalise on the recent advances in hardware, we need to exploit half precision in the solution of linear systems of equations.

##### Image credits Oak Ridge National Lab Summit Gallery

Returning back to our engine analogy, using half precision is like changing the fuel. As we all are aware, we cannot use diesel in a petrol engine, so the engine has to be redesigned or, in the context of linear systems, the algorithms have to be redesigned. This has precisely been one of the core research efforts in our group. To conclude we are at a very interesting point in the development of algorithms for the solution of linear systems of equations, where the emerging architectures provide interesting opportunities for numerical analysts to rethink old algorithms.

For recent work relevant in the group see

Further work is in progress and will be reported here soon.