A New Approach to Probabilistic Rounding Error Analysis
by Nick Higham and Theo Mary
James Wilkinson developed a systematic way of analyzing rounding errors in numerical algorithms in the 1960s—a time when computers typically used double precision arithmetic and a matrix of dimension was considered large. As a result, an error bound with a dominant term , for a low degree polynomial and the machine precision, was acceptably small.
Today, the supercomputers heading the TOP500 list solve linear systems of equations of dimension and half precision arithmetic (accurate to about 4 significant decimal digits) is increasingly available in hardware, notably on graphical processing units (GPUs) from AMD and NVIDIA. Traditional rounding error bounds cannot guarantee accurate results when the dimension is very large or the precision is low. Yet computations are often successful in such cases—for example in machine learning algorithms making use of half, or even lower, precision.
This discrepancy between theory and practice stems from the fact that traditional rounding error bounds are worst-case bounds and so tend to be pessimistic. Indeed, while a single floating-point operation incurs a rounding error bounded in modulus by , the composition of operations leads to a worst-case error bound with dominant term proportional to . But this worst-case error bound is attained only when each rounding error is of maximal magnitude and identical sign, which is very unlikely. Since the beginning of the digital computer era many researchers have modelled rounding errors as random variables in an attempt to obtain better estimates of how the error behaves on average. This line of thinking has led to the well-known rule of thumb, based on informal arguments and assumptions, that constants in rounding error bounds can be replaced by their square roots.
In our EPrint A New Approach to Probabilistic Rounding Error Analysis we make minimal probabilistic assumptions on the rounding errors and make use of a tool from probability theory called a concentration inequality. We show that in several core linear algebra algorithms, including matrix-vector products and LU factorization, the backward error can be bounded with high probability by a relaxed constant proportional to instead of . Our analysis provides the first rigorous justification of the rule of thumb.
This new bound is illustrated in the figure above, where we consider the solution of a linear system by LU factorization. The matrix and vector have entries from the random uniform [0,1] distribution and is formed as . We compare the backward error with its traditional worst-case bound and our relaxed probabilistic bound. The figure shows that the probabilistic bound is in very good agreement with the actual backward error and is much smaller than the traditional bound. Moreover, it successfully captures the asymptotic behavior of the error growth, which follows rather than .
The assumptions underlying our analysis—that the rounding errors are independent random variables of mean zero—do not always hold, as we illustrate with examples in the paper. Nevertheless, our experiments show that the bounds do correctly predict the backward error for a selection of real-life matrices from the SuiteSparse collection.