Author Archives: Massimiliano Fasi

Extreme-Scale Test Matrices with Specified 2-Norm Condition Number

Row H of the compute racks of Summit, the most powerful supercompter in the TOP500 list since June 2018.

The supercomputers in the TOP500 list are ranked using the High-Performance Linpack (HPL) Benchmark, which gauges the performance of a computer by solving a large dense linear system by Gaussian elimination with partial pivoting. The size of the coefficient matrix depends on the computational power of the machine being assessed, because more powerful systems require larger benchmark problems in order to reach their peak performance.

The test matrices used by HPL have random entries uniformly distributed on the interval (-1/2,1/2]. The 2-norm condition number of such matrices depends on their size, and can potentially be very large for the matrices required by today’s most powerful computers: the largest linear systems solved on Summit, the machines that leads the November 2019 TOP500 ranking, have order 10^7, and even larger systems will be needed to benchmark the coming generations of exascale systems.

An n \times n matrix with specified 2-norm condition number can be generated as A := U \varSigma V^T, where U and V are random real orthogonal n \times n matrices from the Haar distribution (a natural uniform probability distribution on the orthogonal matrices) and \varSigma is diagonal with nonnegative entries \sigma_1\ge \cdots \ge \sigma_n \ge 0. It is well known that A has 2-norm condition number \kappa_2(A) = \sigma_1/\sigma_n if \sigma_n \neq 0 and \kappa(A) = \infty otherwise. This technique, which is used by the gallery('randsvd', ...) function, requires 2n^3 floating-point operations to generate a test matrix of order n, and would thus be impractical in an extreme-scale setting.

In our recent EPrint Generating extreme-scale matrices with specified singular values or condition numbers, Nick Higham and I present four methods that, by giving up the requirement that the matrices U and V be from the Haar distribution, reduce the cost of the approach above from cubic to quadratic. The matrices generated retain a number of desirable properties that make them a suitable choice for testing linear solvers at scale.

These cheaper algorithms are particularly well suited to distributed-memory environments, since all communication between the nodes can be avoided at the price of a negligible increase in the overall computational cost. They are appealing to MATLAB users too, as the following example demostrates.

n = 10000; kappa = 1e6; mode = 2; rng(1)
% gallery('randsvd',...)
fprintf('gallery(''randsvd'',...):                %5.2f seconds elapsed.\n',...
% Algorithm 3.1 in our EPrint.
method = 1; matrix = 0;
fprintf('Alg. 3.1 with Haar U:                  %5.2f seconds elapsed.\n',...
matrix = 2;
fprintf('Alg. 3.1 with U=gallery(''orthog'',n,2): %5.2f seconds elapsed.\n',...
% Algorithm 4.1 in our EPrint.
method = 3; matrix = 0;
fprintf('Alg. 4.1 with Haar U:                  %5.2f seconds elapsed.\n',...
matrix = 2;
fprintf('Alg. 4.1 with U=gallery(''orthog'',n,2): %5.2f seconds elapsed.\n',...

In the listing above, randsvdfast is an implementation of our algorithms available on the MATLAB Central File Exchange. Setting the argument matrix to 0 tells our function to pick U from the Haar distribution, whereas setting it to 2 causes U to be chosen in a non-random way. Running this code in MATLAB 2019b on a machine equipped with an Intel processor I5-6500 running at 3.20GHz the script produces the output

gallery('randsvd',...):                79.52 seconds elapsed.
Alg. 3.1 with Haar U:                  19.70 seconds elapsed.
Alg. 3.1 with U=gallery('orthog',n,2):  1.90 seconds elapsed.
Alg. 4.1 with Haar U:                  19.28 seconds elapsed.
Alg. 4.1 with U=gallery('orthog',n,2):  1.43 seconds elapsed.

Therefore, for n = 10{,}000 randsvdfast is up to 56 times faster than gallery('randsvd',...).

Determinants of Bohemian matrix families

Density plots of eigenvalues of matrices from various Bohemian families. Collection of images by Steven Thornton. To find out more about these figures, or to create your own, check out the gallery at the Bohemian Matrices website.

The adjective “Bohemian” was used for the first time in a linear algebra context by Robert Corless and Steven Thornton to describe the eigenvalues of matrices whose entries are taken from a finite discrete set, usually of integers. The term is a partial acronym for “BOunded Height Matrix of Integers”, but the origin of the term was soon forgotten, and the expression “Bohemian matrix” is now widely accepted.

As Olga Taussky observed already in 1960, the study of matrices with integer elements is “very vast and very old”, with early work of Sylvester and Hadamard that dates back to the second half of the nineteenth century. These names are the first two in a long list of mathematicians that worked on what is now known as the “Hadamard conjecture”: for any positive integer n multiple of 4, there exists an n by n matrix H, with entries -1 and +1, such that HH^T = nI.

If this is the best-known open problem surrounding Bohemian matrices, it is far from being the only one. During the 3-day workshop “Bohemian Matrices and Applications” that our group hosted in June last year, Steven Thornton released the Characteristic Polynomial Database, which collects the determinants and characteristic polynomials of billions of samples from certain families of structured as well as unstructured Bohemian matrices. All the available data led Steven to formulate a number of conjectures regarding the determinants of several families of Bohemian upper Hessenberg matrices.

Gian Maria Negri Porzio and I attended the workshop, and set ourselves the task of solving at least one of these open problems. In our recent preprint, we enumerate all the possible determinants of Bohemian upper Hessenberg matrices with ones on the subdiagonal. We consider also the special case of families with main diagonal fixed to zero, whose determinants turn out to be related to some generalizations of Fibonacci numbers. Many of the conjectures stated in the Characteristic Polynomial Database follow from our results.

The Paterson–Stockmeyer method for evaluating polynomials and rational functions of matrices

According to Moler and Van Loan, truncating the Taylor series expansion to the exponential at 0 is the least effective of their nineteen dubious ways to compute the exponential of a matrix. Such an undoubtedly questionable approach can, however, become a powerful tool for evaluating matrix functions when used in conjunction with other strategies, such as, for example, the scaling and squaring technique. In fact, truncated Taylor series are just a special case of a much larger class of rational approximants, known as the Padé family, on which many state-of-the-art algorithms rely in order to compute matrix functions.

A customary choice for evaluating these approximants at a matrix argument is the Paterson–Stockmeyer method, an evaluation scheme that was originally proposed as an asymptotically optimal algorithm for evaluating polynomials of matrices, but generalizes quite naturally to rational functions, which are nothing but the solutions of a linear system whose coefficients and right-hand side are polynomials of the same matrix. This technique exploits a clever rewriting of a polynomial in A as a polynomials in A^s, for some positive integer s, and overall requires about 2\sqrt{k} matrix products to evaluate a polynomial of degree k.

Number of matrix multiplications required to evaluate a polynomial of degree between 0 and 50.

As shown in the figure, when the Paterson–Stockmeyer scheme is used the number of matrix multiplications required to evaluate a polynomial of degree k grows slower than k itself, with the result that evaluating polynomials of different degree will asymptotically have the same cost. For example, evaluating a polynomial of any degree between 43 and 49 requires 12 matrix multiplications, thus there is little point in considering an approximant of degree 43 when evaluating that of degree 49 has roughly the same cost but will in all likelihood deliver a more accurate result.

When designing algorithms to evaluate functions of matrices, one is interested in finding the optimal degrees, those marked with a red circle in the figure above, since they guarantee maximal accuracy for a given computational cost. When fixed precision is considered, finding all such degrees is not a problem: a backward error analysis can be used to determine the maximum degree that will ever be needed, m_{max} say, and then looking at the plot is enough to find all the optimal degrees smaller than m_{max}. In order to deal with arbitrary precision arithmetic, however, a different strategy is needed, as depending on the working precision and the required tolerance, approximants of arbitrarily high degree may be needed. The new Eprint Optimality of the Paterson–Stockmeyer Method for Evaluating Matrix Polynomials and Rational Matrix Functions studies the cost of the Paterson–Stockmeyer method for polynomial evaluation and shows that a degree is optimal if and only if it is a quarter-square, that is, a number of the form \lfloor n^2/4 \rfloor for some nonnegative integer n, where \lfloor \cdot \rfloor is the floor function.

Similar results can be obtained for Paterson–Stockmeyer-like algorithms for evaluating diagonal Padé approximants, rational functions whose numerator and denominator have same degree. In that case, one can show that an approximant is optimal if and only if the degree of numerator and denominator is an eight-square, an integer of the form \lfloor n^2/8 \rfloor for some n \in \mathbb{N}. In particular, for diagonal Padé approximants to the exponential, due to a symmetry of the coefficients of numerator and denominator, faster algorithms can be developed, and an explicit formula—not as nice as that in the two previous cases—can be derived for the optimal orders of these approximants.

Computing the matrix exponential in arbitrary precision

Edmond Laguerre was the first to discuss, en passant in a paragraph of a long letter to Hermite, the exponential of a matrix. In fact, the French mathematician confined himself to defining e^X via the Taylor series expansion of e^z at 0 and remarking that the scalar identity e^{x+y} = e^x e^y does not generalise to matrices. Surprisingly perhaps, truncating the Taylor expansion is still today, just over 150 years later, one of the most effective way of approximating the matrix exponential, along with the use of a family of rational approximants discovered a couple of decades later by Henri Padé, a student of Hermite’s.

Despite the fact that the exponential is an entire function, Taylor and Padé approximants converge only locally, and may require an approximant of exceedingly high order for matrices of large norm. Therefore, scaling the input is often necessary, in order to ensure fast and accurate computations. This can be effectively achieved by means of a technique called scaling and squaring, first proposed in 1967 by Lawson, who noticed that by exploiting the identity e^A = \bigl(e^{2^{-s}A}\bigr)^{2^s} one can compute the matrix exponential by evaluating one of the approximants above at 2^{-s}A and then squaring the result s times.

The choice of s and of the order of the approximant hinges on a delicate trade-off that has been thoroughly studied in the last 50 years. The most efficient algorithms are tailored to double precision, and rely on the computation of precision-dependent constants, a relatively expensive task that needs to be performed only once during the design stage of the algorithm, and produces algorithms that are very efficient at runtime.

These techniques, however, do not readily extend to arbitrary precision environments, where the working precision at which the computation will be performed is known only at runtime. In the EPrint An arbitrary precision scaling and squaring algorithm for the matrix exponential, Nick Higham and I show how the scaling and squaring method can be adapted to the computation of the matrix exponential in precisions higher than double.

The algorithm we propose combines a new analysis of the forward truncation error of Padé approximants to the exponential with an improved strategy for selecting the parameters required to perform the scaling and squaring step. We evaluate at runtime a bound that arises naturally from our analysis in order to check whether a certain choice of those parameters will deliver an accurate evaluation of the approximant or not. Moreover, we discuss how this technique can be implemented efficiently for truncated Taylor series and diagonal Padé approximants.

Forward error in 1024 digits and performance profile
Top left: forward error of several algorithms on a test set of non-Hermitian matrices. Top right: corresponding performance profile. Bottom: legend for the other two plots. Variants of our algorithm are in the top row, existing implementations for computing the matrix exponential in arbitrary precision are in the bottom row.

We evaluate experimentally several variants of the proposed algorithm in a wide range of precisions, and find that they all behave in a forward stable fashion. In particular, the most accurate of our implementations outperforms existing algorithms for computing the matrix exponential in high precision.