## Solving Block Low-Rank Linear Systems by LU Factorization is Numerically Stable

In many applications requiring the solution of a linear system $Ax=b$, the matrix $A$ possesses a block low-rank (BLR) property: most of its off-diagonal blocks are of low numerical rank and can therefore be well approximated by low-rank matrices. This property arises for example in the solution of discretized partial differential equations, because the numerical rank of a given off-diagonal block is closely related to the distance between the two subdomains associated with this block. BLR matrices have been exploited to significantly reduce the cost of solving $Ax=b$, in particular in the context of sparse direct solvers such as MUMPS, PaStiX, and STRUMPACK.

However, the impact of these low-rank approximations on the numerical stability of the solution in floating-point arithmetic has not been previously analyzed. The difficulty of such an analysis lies in the fact that, unlike for classical algorithms without low-rank approximations, there are two kinds of errors to analyze: floating-point errors (which depend on the unit roundoff $u$), and low-rank truncation errors (which depend on the the low-rank threshold $\varepsilon$, the parameter controlling the accuracy of the blockwise low-rank approximations). Moreover, the two kinds of error cannot easily be isolated in the analysis, because the BLR compression and factorization stages are often interlaced.

In our recent preprint, with Nick Higham, we present rounding error analysis for the solution of a linear system by LU factorization of BLR matrices. Assuming that a stable pivoting scheme is used, we prove backward stability: the relative backward error is bounded by a modest constant times $\varepsilon$, and does not depend on the unit roundoff $u$ as long as $u$ is safely smaller than $\varepsilon$. This is a very desirable theoretical guarantee that can now be given to the users, who can therefore control the numerical behavior of BLR solvers simply by setting $\varepsilon$ to the target accuracy. We illustrate this key result in the figure below, which shows a strong correlation between the backward error and the $\varepsilon$ threshold for 26 matrices from the SuiteSparse collection coming from a wide range of real-life applications.

## Determinants of Bohemian matrix families

The adjective “Bohemian” was used for the first time in a linear algebra context by Robert Corless and Steven Thornton to describe the eigenvalues of matrices whose entries are taken from a finite discrete set, usually of integers. The term is a partial acronym for “BOunded Height Matrix of Integers”, but the origin of the term was soon forgotten, and the expression “Bohemian matrix” is now widely accepted.

As Olga Taussky observed already in 1960, the study of matrices with integer elements is “very vast and very old”, with early work of Sylvester and Hadamard that dates back to the second half of the nineteenth century. These names are the first two in a long list of mathematicians that worked on what is now known as the “Hadamard conjecture”: for any positive integer $n$ multiple of 4, there exists an $n$ by $n$ matrix $H$, with entries $-1$ and $+1$, such that $HH^T = nI$.

If this is the best-known open problem surrounding Bohemian matrices, it is far from being the only one. During the 3-day workshop “Bohemian Matrices and Applications” that our group hosted in June last year, Steven Thornton released the Characteristic Polynomial Database, which collects the determinants and characteristic polynomials of billions of samples from certain families of structured as well as unstructured Bohemian matrices. All the available data led Steven to formulate a number of conjectures regarding the determinants of several families of Bohemian upper Hessenberg matrices.

Gian Maria Negri Porzio and I attended the workshop, and set ourselves the task of solving at least one of these open problems. In our recent preprint, we enumerate all the possible determinants of Bohemian upper Hessenberg matrices with ones on the subdiagonal. We consider also the special case of families with main diagonal fixed to zero, whose determinants turn out to be related to some generalizations of Fibonacci numbers. Many of the conjectures stated in the Characteristic Polynomial Database follow from our results.

## A Class of Fast and Accurate Summation Algorithms

Summing $n$ numbers is a key computational task at the heart of many numerical algorithms. When performed in floating-point arithmetic, summation is subject to rounding errors: for a machine precision $u$, the error bound for the most basic summation algorithms, such as recursive summation, is proportional to $nu$.

Nowadays, with the growing interest in low floating-point precisions and ever increasing $n$ in applications, such error bounds have become unacceptably large. While summation algorithms leading to smaller error bounds are known (compensated summation is an example), they are computationally expensive.

In our recent preprint, Pierre Blanchard, Nick Higham and I propose a class of fast and accurate summation algorithms called FABsum. We show that FABsum has an error bound of the form $bu+O(u^2)$, where $b$ is a block size, which is independent of $n$ to first order. As illustrated by the figure below, which plots the measured error using single precision as a function of $n$, FABsum can deliver substantially more accurate results than recursive summation. Moreover, FABsum can be easily incorporated in high-performance numerical linear algebra kernels in order to boost accuracy with only a modest drop in performance, as we demonstrate in the paper with the PLASMA library.

## Simulating Low Precision Floating-Point Arithmetics

In earlier blog posts, I wrote about the benefits of using half precision arithmetic (fp16) and about the problems of overflow and underflow in fp16 and how to avoid them. But how can one experiment with fp16, or other low precision formats such as bfloat16, in order to study how algorithms behave in these arithmetics? (For an accessible introduction to fp16 and bfloat16 see the blog post by Nick Higham.)

As of now, fp16 is supported by several GPUs, but these are specialist devices and they can be very expensive.  Moreover, architectures that support bfloat16 have not yet not been released. Therefore software that simulates these floating-point formats is needed.

In our latest EPrint, Nick Higham and I investigate algorithms for simulating fp16, bfloat16 and other low precision formats. We have also written a MATLAB function chop that can be incorporated into other MATLAB codes to simulate low precision arithmetic.  It can easily be used to study the effect of low precision formats on various algorithms.

Imagine a hypothetical situation where the computer can just represent integers. Then the question is how do we represent numbers like 4/3? An obvious answer would be to represent it via the integer closest to it, 1 in this case. However, one will have to come with a convention to handle the case where the number is in the centre. Now replace the integer in the example with floating-point numbers, and a similar question arises. This process of converting any given number to a floating-point number is called  rounding. If we adopt a rule where we choose the closest floating-point number (as above), then we formally call it as ‘round to nearest’. There are other ways to round as well, and different rounding modes can yield different results for the same code. However meddling with the parameters of a floating-point format without a proper understanding of their consequences can be a recipe for disaster. Cleve Moler in his blog on sub-normal numbers makes this point by warning ‘don’t try this at home’. The MATLAB software we have written provides a safe environment to experiment with the effects of changing any parameter of a floating-point format (such as rounding modes and support of subnormal numbers) on the output of a code. All the technical details can be found in the Eprint and our MATLAB codes.

## Computing the Wave-Kernel Matrix Functions

The wave-kernel functions $\cosh{\sqrt{A}}$ and $\mathrm{sinhc}{\sqrt{A}}$ arise in the solution of second order differential equations such as $u''(t) - Au(t) = b(t)$ with initial conditions at $t=0$. Here, $A$ is an arbitrary square matrix and $\mathrm{sinhc}{z} = \sinh(z)/z$. The square root in these formulas is illusory, as both functions can be expressed as power series in $A$, so there are no questions about existence of the functions.

How can these functions be computed efficiently? In Computing the Wave-Kernel Matrix Functions (SIAM J. Sci. Comput., 2018) Prashanth Nadukandi and I develop an algorithm based on Padé approximation and the use of double angle formulas. The amount of scaling and the degree of the Padé approximant are chosen to minimize the computational cost subject to achieving backward stability for $\cosh{\sqrt{A}}$ in exact arithmetic.

In the derivation we show that the backward error of any approximation to $\cosh{\sqrt{A}}$ can be explicitly expressed in terms of a hypergeometric function. To bound the backward error we derive and exploit a new bound for $\|A^k\|^{1/k}$ in terms of the norms of lower powers of $A$; this bound is sharper than one previously obtained by Al-Mohy and Higham.

Numerical experiments show that the algorithm behaves in a forward stable manner in floating-point arithmetic and is superior in this respect to the general purpose Schur–Parlett algorithm applied to these functions.

The fundamental regions of the

function cosh(sqrt(z)), needed for the backward error analysis

underlying the algorithm.

## A new preconditioner exploiting low-rank factorization error

The solution of a linear system $Ax = b$ is a fundamental task in scientific computing. Two main classes of methods to solve such a system exist.

• Direct methods compute a factorization of matrix $A$, such as LU factorization, to then directly obtain the solution $x=U^{-1}L^{-1}b$ by triangular substitution; they are very reliable but also possess a high computational cost, which limits the size of problems that can be tackled.
• Iterative methods compute a sequence of iterates $x_k$ converging towards the solution $x$; they are inexpensive but their convergence and thus reliability strongly depends on the matrix properties, which limits the scope of problems that can be tackled.

A current major challenge in the field of numerical linear algebra is therefore to develop methods that are able to tackle a large scope of problems of large size.

To accelerate the convergence of iterative methods, one usually uses a preconditioner, that is, a matrix $M$ ideally satisfying three conditions: (1) $M$ is cheap to compute; (2) $M$ can be easily inverted; (3) $M^{-1}$ is a good approximation to $A^{-1}$. With such a matrix $M$, the preconditioned system $M^{-1}Ax=M^{1}b$ is then cheap to solve with an iterative method and often requires a small number of iterations only. An example of a widely used class of preconditioners is when $M$ is computed as a low-accuracy LU factorization.

Unfortunately, for many important problems it is quite difficult to find a preconditioner that is both of good quality and cheap to compute, especially when the matrix $A$ is ill conditioned, that is, when the ratio between its largest and smallest singular values is large.

In our paper A New Preconditioner that Exploits Low-rank Approximations to Factorization Error, with Nick Higham, which recently appeared in SIAM Journal of Scientific Computing, we propose a novel class of general preconditioners that builds on an existing, low-accuracy preconditioner $M=A-\Delta A$.

This class of preconditioners is based on the following key observation: ill-conditioned matrices that arise in practice often have a small number of small singular values. The inverse of such a matrix has a small number of large singular values and so is numerically low rank. This observation suggests that the error matrix $E = M^{-1}A - I = M^{-1}\Delta A \approx A^{-1}\Delta A$ is of interest, because we may expect $E$ to retain the numerically low-rank property of $A^{-1}$.

In the paper, we first investigate theoretically and experimentally whether $E$ is indeed numerically low rank; we then describe how to exploit this property to accelerate the convergence of iterative methods by building an improved preconditioner $M(I+\widetilde{E})$, where $\widetilde{E}$ is a low-rank approximation to $E$. This new preconditioner is equal to $A-M(E-\widetilde{E})$ and is thus almost a perfect preconditioner if $\widetilde{E}\approx E$. Moreover, since $\widetilde{E}$ is a low-rank matrix, $(I+\widetilde{E})^{-1}$ can be cheaply computed via the Sherman–Morrison–Woodbury formula, and so the new preconditioner can be easily inverted.

We apply this new preconditioner to three different types of approximate LU factorizations: half-precision LU factorization, incomplete LU factorization (ILU), and block low-rank (BLR) LU factorization. In our experiments with GMRES-based iterative refinement, we show that the new preconditioner can achieve a significant reduction in the number of iterations required to solve a variety of real-life $Ax=b$ problems.

## The Paterson–Stockmeyer method for evaluating polynomials and rational functions of matrices

According to Moler and Van Loan, truncating the Taylor series expansion to the exponential at 0 is the least effective of their nineteen dubious ways to compute the exponential of a matrix. Such an undoubtedly questionable approach can, however, become a powerful tool for evaluating matrix functions when used in conjunction with other strategies, such as, for example, the scaling and squaring technique. In fact, truncated Taylor series are just a special case of a much larger class of rational approximants, known as the Padé family, on which many state-of-the-art algorithms rely in order to compute matrix functions.

A customary choice for evaluating these approximants at a matrix argument is the Paterson–Stockmeyer method, an evaluation scheme that was originally proposed as an asymptotically optimal algorithm for evaluating polynomials of matrices, but generalizes quite naturally to rational functions, which are nothing but the solutions of a linear system whose coefficients and right-hand side are polynomials of the same matrix. This technique exploits a clever rewriting of a polynomial in $A$ as a polynomials in $A^s$, for some positive integer $s$, and overall requires about $2\sqrt{k}$ matrix products to evaluate a polynomial of degree $k$.

As shown in the figure, when the Paterson–Stockmeyer scheme is used the number of matrix multiplications required to evaluate a polynomial of degree $k$ grows slower than $k$ itself, with the result that evaluating polynomials of different degree will asymptotically have the same cost. For example, evaluating a polynomial of any degree between 43 and 49 requires 12 matrix multiplications, thus there is little point in considering an approximant of degree 43 when evaluating that of degree 49 has roughly the same cost but will in all likelihood deliver a more accurate result.

When designing algorithms to evaluate functions of matrices, one is interested in finding the optimal degrees, those marked with a red circle in the figure above, since they guarantee maximal accuracy for a given computational cost. When fixed precision is considered, finding all such degrees is not a problem: a backward error analysis can be used to determine the maximum degree that will ever be needed, $m_{max}$ say, and then looking at the plot is enough to find all the optimal degrees smaller than $m_{max}$. In order to deal with arbitrary precision arithmetic, however, a different strategy is needed, as depending on the working precision and the required tolerance, approximants of arbitrarily high degree may be needed. The new Eprint Optimality of the Paterson–Stockmeyer Method for Evaluating Matrix Polynomials and Rational Matrix Functions studies the cost of the Paterson–Stockmeyer method for polynomial evaluation and shows that a degree is optimal if and only if it is a quarter-square, that is, a number of the form $\lfloor n^2/4 \rfloor$ for some nonnegative integer $n$, where $\lfloor \cdot \rfloor$ is the floor function.

Similar results can be obtained for Paterson–Stockmeyer-like algorithms for evaluating diagonal Padé approximants, rational functions whose numerator and denominator have same degree. In that case, one can show that an approximant is optimal if and only if the degree of numerator and denominator is an eight-square, an integer of the form $\lfloor n^2/8 \rfloor$ for some $n \in \mathbb{N}$. In particular, for diagonal Padé approximants to the exponential, due to a symmetry of the coefficients of numerator and denominator, faster algorithms can be developed, and an explicit formula—not as nice as that in the two previous cases—can be derived for the optimal orders of these approximants.