## Wilkinson and Backward Error Analysis

by Sven Hammarling and Nick Higham

It is often thought that Jim Wilkinson developed backward error analysis because of his early involvement in solving systems of linear equations. In his 1970 Turing lecture [5] he described an experience, during world war II at the Armament Research Department, of solving a system of twelve linear equations on a desk computer, using Gaussian elimination. (He doesn’t say how long it took, but it must surely have been several days.) The coefficients were of order unity and, using ten decimal digit computation, he found that the coefficients of the reduced equation determining $x_{12}$ had four leading zeros, so he felt that the solutions could surely have no more than six correct figures. As a check on his calculations, he then computed the residuals and to his surprise the left hand sides agreed with the right hand sides to the full ten figures.

After the war Wilkinson joined the Mathematics Division at the National Physical Laboratory. Soon after his arrival, a system of eighteen equations were given to the Mathematics Division. This required a joint effort, which was manned by Leslie Fox, Eric Goodwin, Alan Turing and Wilkinson. Again, the solution was somewhat ill conditioned, as revealed by the final reduced equation, but again in computing the residuals the right and left hand sides agreed to full accuracy. Incidentally, Wilkinson and his colleagues used iterative refinement, which convinced them that the first solution had been accurate to six figures.

These experiences did not straightforwardly lead Wilkinson to develop backward error analysis. In [7] he says that he first used backward error analysis in connection with simple programs for computing zeros of polynomials soon after PILOT ACE came into use, specifically a program for evaluating a polynomial by nested multiplication and a program for carrying out polynomial deflation. But he nevertheless did not recognise backward error analysis as a general tool. Wilkinson explains

It is natural to ask why I did not immediately set about using this type of error analysis as a general purpose tool. In retrospect it seems amazing that I did not try it on Gaussian elimination and on various eigenvalue algorithms in which I was keenly interested at the time. The truth is that it did not occur to me for one moment to do so.

His explicit recognition of a tool that he decided to call backward error analysis soon came through his experience of solving eigenvalue problems on PILOT ACE. He states further in [7]:

Because of the small storage capacity of the PILOT ACE virtually the only algorithm that could be used for dealing with large unsymmetric eigenvalue problems was the power method supplemented by various techniques for accelerating convergence. After each eigenvalue/eigenvector was determined this pair was removed by deflation. Now at that time deflation was generally held to be extremely unstable and accordingly I used it at first with great trepidation. However, it soon became evident that it was being remarkably effective.

As with the linear equation problem, Wilkinson computed residuals, $r = A\hat{x} - \hat{\lambda} \hat{x}$, where $\hat{x}$ and $\hat{\lambda}$ are the computed values, with $\hat{x}$ normalised so that $\hat{x}^T \hat{x} = 1$, and, even after many deflations, he found that the residuals were remarkably small. He then realised that

$(A - r \hat{x}^T) \hat{x} = \hat{\lambda} \hat{x} \;\;\; \mbox{exactly}.$

and this led him directly to the backward error analysis since, if we put $E = -r \hat{x}^T$, then $\hat{\lambda}$ and $\hat{x}$ are an exact eigenvalue and eigenvector of the matrix $A + E$. He now recognised that the process could be widely used and this, of course, led to his 1963 book Rounding Errors in Algebraic Processes [3] and, soon after, to The Algebraic Eigenvalue Problem [4].

It should be noted that Wilkinson did not claim to be the first to perform a backward error analysis. He attributes the first analysis to von Neumann and Goldstine in their 1947 paper [2], which, as Wilkinson said in his von Neumann prize paper “is not exactly bedtime reading” [6]. Wilkinson also gives great credit to Givens for his backward error analysis of orthogonal tridiagonalisation in his, sadly, unpublished technical report [1].

References

[1]   W. Givens. Numerical computation of the characteristic values of a real symmetric matrix. Technical Report ORNL-1574, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA, 1954.

[2]   J. von Neumann and H. H. Goldstine. Numerical inverting of matrices of high order. Bull. Amer. Math. Soc., 53:1021–1099, 1947.

[3]   J. H. Wilkinson. Rounding Errors in Algebraic Processes. Notes on Applied Science, No.32. HMSO, London, UK, 1963. (Also published by Prentice-Hall, Englewood Cliffs, NJ, USA, 1964. Reprinted by Dover Publications, New York, 1994).

[4]   J. H.Wilkinson. The Algebraic Eigenvalue Problem. Oxford University Press, Oxford, UK, 1965.

[5]   J. H. Wilkinson. Some comments from a numerical analyst. J. ACM, 18:137–147, 1971. (The 1970 A. M. Turing lecture).

[6]   J. H. Wilkinson. Modern error analysis. SIAM Review, 13:548–568, 1971. (The 1970 von Neumann lecture).

[7]   J. H. Wilkinson. The state of the art in error analysis. NAG Newsletter, 2/85:5–28, 1985. (Invited lecture for the NAG 1984 Annual General Meeting).

## Our Alumni – Craig Lucas

In this blog post, we asked one of our alumni, Craig Lucas, a few questions about his time with the Numerical Linear Algebra Group.

Please can you introduce yourself and tell us a bit about your experience before attending University of Manchester?

I came to study Mathematics a little later than usual. I was a technician civil engineer working in land reclamation in Staffordshire and needed a change! I was always told I was good at maths and thought at 27 I should get a degree. I am very grateful to Graham Bowtell  at City University who took a chance on someone without A-levels. I developed an interest in Numerical Analysis and computing and wanted to take my study as far as I could. That brought me to Manchester for an MSc, and ultimately a PhD.

What was your PhD thesis on?

My thesis, supervised by Nick Higham, was “Algorithms for Cholesky and QR Factorizations, and the Semidefinite Generalized Eigenvalue Problem.” Arguably a rag bag of algorithms building on my MSc experience of symmetric matrices. I also met and worked with Sven Hammarling on QR updating. He then worked for NAG, as I do now.

Why did you choose to study your PhD in Manchester?

During my MSc I realised I was working with world leaders in their field. It wasn’t a difficult decision to stay on for a PhD, in fact, I felt incredibly lucky to have that opportunity.

How did you find Manchester?

I hated it! I had come up from London and it felt that a whole new world. I wasn’t used to strangers talking to me in the street! However, after about 18 months the place really started to grow on me, and now, nearly 20 years later, I can’t imagine living anywhere else. We have an incredible arts scene, fantastic restaurants, brilliant transport links and a cost of living that makes living back in London seem ridiculous.

Firstly I never really left. In the 15 years since I finished my PhD I have taught on my old MSc, supervised students and several KTP projects jointly with the Numerical Linear Algebra group. After my PhD, I went to work in research computing at Manchester first, in high performance computing (HPC.) Then just over 10 years ago I joined NAG where I could use both my numerical analysis and HPC skills.

I run NAG’s Manchester Office, which is a rather nice penthouse on Portland Street with a roof terrace, and the HPC team here. I am supervising my third KTP, involved in running NAG’s contribution to the EU POP project and every now and then write some mathematical software.

## For a Few Equations More

On the occasion of the Gilles Kahn prize award ceremony, I was asked to write an article about my PhD thesis for the popular science blog Binaire from the French newspaper Le Monde (the French version of the article is available here). You can read the English translation below.

Can you tell what the following problems have in common: predict tomorrow’s weather, build crash-resisting cars, scan the bottom of the oceans searching for oil? These are all difficult problems that are too costly to be tackled physically. Importantly, they can also be described by a fundamental tool of mathematics: linear equations. Therefore, the solution of these physical problems can be numerically simulated by solving instead systems of linear equations.

You probably remember from math class in high school how tedious solving these systems could get, even when they had a small number of equations. In practice, it is actually quite common to face systems with thousands or even millions or equations. While computers can fortunately solve these systems for us, the computational cost of the solution can become very high for such large numbers of equations.

To respond to this need, great quantities of resources and money have been dedicated to the construction of supercomputers of great computational power, equipped with a large number of computing units called “cores”. For example, while your personal computer is likely to have less than a dozen cores, the most powerful supercomputer in the world has several millions of these cores. Nevertheless, the size of the problems that we must tackle today is so great that even these supercomputers are not sufficient.

To take up this challenge, I have worked during my PhD thesis on new algorithms to solve systems of linear equations whose computational cost is greatly reduced. More precisely, a crucial property of these algorithms is that their cost grows slowly with the number of equations: this is referred to as their complexity. Methods of very low complexity (so-called “hierarchical”) have been proposed since the 2000s. However, these hierarchical methods are quite complex and sophisticated, which makes them unable to attain high performance on supercomputers: that is, their high reduction of the theoretical complexity is translated into only very modest gains in terms of actual computing time.

For this reason, my PhD thesis focused on another method (so-called “Block Low Rank”), that is better suited than hierarchical methods for high performance computing. My first achievement was to compute the complexity of this method, which was previously unknown. I proved that, even if its complexity is slightly higher than than of hierarchical methods, it is still low enough to tackle systems of very large dimensions. In the second part of my thesis, I worked on efficiently implementing this method on supercomputers, so as to translate this theoretical reduction into actual time gains.

By significantly reducing the cost of solving systems of linear equations, this work allowed us to solve several physical problems that were previously too large to be tackled. For instance, it took less than an hour to solve a system of 130 million equations arising in a geophysical application, using a supercomputer equipped with 2400 cores.