Tag Archives: MATLAB

Computational Graphs for Matrix Functions

Numerical algorithms for evaluating a function f at an n \times n matrix A are based on a variety of different approaches, but for a large class of methods the approximation is formed by relying on three basic operations:

  • linear combination of matrices Z \gets c_X X + c_Y Y,
  • matrix multiplication Z \gets X \cdot Y, and
  • solution of a linear system with n right-hand sides X^{-1} \cdot Y,

where X and Y are n \times n matrices, and c_X and c_Y are scalars.

Algorithms that combine only these three basic building blocks are particularly attractive, as they correspond to functions that are easy to work with: if an expression for the scalar (n = 1) function g features only linear combinations, multiplications, and inversions, and g is defined on the spectrum of A, then a formula for g(A) can be obtained by replacing all occurrences of z in the formula for g(z) with A.

The choice of considering only three operations may seem restrictive, but, in fact, we are looking at a large class of methods—a class that comprises most algorithms based on polynomial or rational approximation and includes, among others, the scaling and squaring algorithm for the matrix exponential and many iterative algorithms for matrix roots.

These algorithms can be interpreted as graphs, and in particular as directed acyclic graphs (DAGs). Each node in the graph represents a matrix: the inputs are stored in the leaves (nodes without incoming edges), the internal nodes (with both incoming and outgoing edges) hold the intermediate results, and the output of the computation can be read off the root nodes (nodes without outgoing edges). In such a graph, the arcs represent data dependencies and tell us in what order the matrices ought to be computed for the algorithm to return the desired result. Two examples of such computational graphs are in the figure below.

Two examples of computational graphs.
Examples of computational graphs: ten steps of the Denman–Beavers iteration for the square root (left) and the scaling and squaring algorithm for the matrix exponential using a degree-13 rational approximant and five squaring steps (right). Blue arcs represent linear combinations, red arcs represent matrix multiplications, and black dash-dotted arcs represent system solves. The input nodes have a white background, and all other nodes have a the same color as the arc of the operation that produces them. The output node (result of the computation) has a double border. Nodes with only one visible parent are the result of either a squaring (A^2, A^4, A^6, Z^{2}, Z^{4}, Z^{8}, Z^{16}, and Z^{32}) or a trivial operation involving the identity matrix: a matrix inversion, represented as C=A^{-1}I, or a multiplication by a scalar, represented as C = \alpha A + 0 I

Computational graphs are arguably an elegant way of looking at matrix functions and at algorithms to compute them. In recent work, we have shown that they can also be a practical tool to study existing techniques, improve them, and ultimately derive new ones. The accuracy of an algorithm for evaluating a matrix function is determined by the accuracy of its scalar counterpart, thus the design of new methods for functions of matrices can be seen as a scalar-valued optimization problem. Computational graphs provide a particularly compelling way of carrying out the optimization, as the scalar derivatives can be evaluated automatically by exploiting the structure of the graph. This technique is akin to backpropagation, an approach for training neural networks customarily used in machine learning.

The workflow to design by optimization a new algorithm to approximate f consists of three steps.

First, we choose the topology of the graph, that is, we fix how many linear combinations, matrix multiplications, and matrix inversions the algorithm is allowed to use, and in which order these operations are to be combined. In practice, we take an existing algorithm for evaluating f at a matrix argument, we recast it as a computational graph, and we add nodes representing linear combinations wherever possible. These extra nodes are beneficial, because they increase the number of available degrees of freedom without changing the computational cost of the underlying algorithm.

Next, we optimize the coefficients of the graph. Let us denote by g the function represented by the computational graph. We write g(z; c) to stress that the value of g at z depends on the set of parameters c, which is a vector containing the coefficients of all the linear combinations that appear in the computational graph. These coefficients are the scalars c_X and c_Y that appear in all linear combinations of the form c_X X + c_Y Y in the graph, and they are the quantities that we want to optimize. The goal of the optimization is to find a vector of coefficients c^* such that g(\cdot; c^*) approximates a given function f in a domain \Omega \subset \mathbb C. In practice, we set \Omega to a disc where f and g are both analytic, so that we only need to minimize the error |g(z,c) - f(z)| over (a discretization of) the boundary of \Omega. This produces a surrogate nonlinear least-square problem, which we solve by means of the Gauss–Newton algorithm, a simple choice that already produces good results.

Finally, we use error analysis to bound the error of g as an approximant to f. By relying on higher precision, we are able to guarantee that within \Omega the forward and backward error are below a tolerance parameter \varepsilon. By choosing a suitable set of domains \Omega_1 \subset \ldots \subset \Omega_\ell, we can obtain a set of functions g_1, , g_\ell that approximate f to a specified accuracy and at increasing computational cost. Given these functions and a matrix A, we can then choose, inexpensively and at runtime, the function g_i that represents the best trade-off between accuracy and speed, and we can use it to compute f(A) \approx g_i(A).

Numerical results show that this process has the potential to improve state-of-the-art algorithms when f is the matrix exponential or the square root of a matrix. But care is needed in assessing the quality of the new algorithms, as the accuracy of the new algorithms depends on both the topology of the graph and the quality of the starting guess. This is illustrated in the figure below for two algorithms for the computation of \sqrt{z+1} within the disk of radius 1/2 centered at 0: optimizing the coefficients of a graph representing four steps of the Denman–Beavers iteration brings down the maximum absolute approximation error by over nine orders of magnitude, but for a graph constructed using the Paterson–Stockmeyer evaluation of the degree-13 truncated Taylor approximant the error only improves by three orders of magnitude.

Absolute forward error of four schemes for approximating the function f(x) = \sqrt{x+1}. The plots on the left correspond to four steps of the Denman–Beavers iteration (top) and to the Paterson–Stockmeyer evaluation of the degree-13 truncated Taylor approximants (bottom). The plots on the right correspond to graphs with optimized coefficients (the computational graphs on the left in the same row were used as starting value for the optimization). The blue circles have radius 1/2 and z-value equal to the maximum relative error within the disk: this value drops from 1.5 \cdot 10^{-6} to 1.4 \cdot 10^{-15} for the graphs in the first line, but only from 1.9 \cdot 10^{-6} to 1.5 \cdot 10^{-9} for those in the second.

As part of this project, we developed a Julia package, GraphMatFun.jl, which offers tools to generate and manipulate computational graphs, modify their topology, optimize their coefficients, and generate C, Julia, and MATLAB code to evaluate them efficiently at scalar and matrix arguments. The software is released under the terms of the MIT license and is freely available on GitHub.

Exploiting the MATLAB Language in Implementing a Matrix Collection

“Concise code” may not be commonly used or defined, but as in languages, conciseness—the quality of communicating information in as few words as is feasible—can be a good guide in programming. Conciseness of code, which could mean short and self-explanatory lines or blocks of code, is easier to achieve in higher level languages such as Python or MATLAB, and more difficult when writing, for example, in RISC assembly languages with short instructions each doing very little.

For developing the recently released matrix collection, Anymatrix [1,2] (with Nick Higham), we used MATLAB and exploited its string handling routines to do complicated string manipulations in a few lines of code. This post describes some details of the implementation and further information can be found in the user’s guide [3].

Groups of Matrices

Anymatrix enforces the organization of matrices into groups, which means a new matrix has to be placed either into one of the existing groups or a new group. For example, the built-in groups are contest, core, gallery, hadamard, matlab, nessie, and regtools. Each of these come with around 20 matrices, which can be fixed-sized matrices or matrix generators.

Matrices are identified by <group_name>/<matrix_name>. For example, core/beta is a matrix called beta in the core group. Matrix names are their file names minus the extension .m. Group names are directory names of the groups. Since the IDs of matrices are derived from the directory and file names, uniqueness is guaranteed. This also means that multiple matrices can have the same name as long as they are in different groups.

Around the time we started working on this collection, MATLAB R2020b introduced pattern, to search and match strings conveniently. An example usage in Anymatrix is as follows.

% Matrix ID pattern
matrix_ID_pat = ...
  asManyOfPattern(alphanumericsPattern | ...
    characterListPattern("_") | ...
    characterListPattern("-")) + ...
  '/' + ...
  asManyOfPattern(alphanumericsPattern | ...
    characterListPattern("_") | ...
    characterListPattern("-"));

This code creates a pattern for matrix IDs, which matches sequences of letter, digit, underscore or hyphen characters, followed by a single forward slash, followed again by the sequence. This sort of pattern can then be used, for example, to match strings (using matches) and extract substrings (using extract):

>> S = '...core/beta...';
>> extract(S, matrix_ID_pat)
ans =
  1×1 cell array
    {'core/beta'}

The pattern functionality helped in writing concise code. We used it in various places of Anymatrix, for example in detecting matrix IDs in the commands and extracting substrings in the Boolean search expressions to separate properties from brackets and operators and/or/not.

Properties of Matrices

Matrix properties can be specified in two ways, either as an assignment properties = {...} inside the matrix .m files, or in the am_properties.m files in group folders. With the latter option we may specify properties of multiple matrices in one assignment. Anymatrix contains a list of recognized properties, such as ill conditioned or banded. When Anymatrix scans the file structure, it gives a warning if a matrix contains an unrecognized property.

Properties can be searched by Boolean expressions with operators and/or/not and brackets for specifying precedence. For example:

>> anymatrix('p', 'integer and ill conditioned')
ans =
  3×1 cell array
    {'core/wilson'     }
    {'gallery/dramadah'}
    {'matlab/pascal'   }

The following function that transforms the supplied Boolean expression into an expression that can be evaluated in MATLAB, demonstrates a lot of MATLAB’s features that we made use of.

function IDs = search_by_properties(expression)
  IDs = {};
  % Replace 'and', 'or', and 'not' by corresponding MATLAB symbols.
  expression = replace(expression, ' and ', ' & ');
  expression = replace(expression, ' or ', ' | ');
  expression = replace(expression, ' not ', ' ~');
  expression = replace(expression, '(not ', '(~');
  if startsWith(expression, 'not')
    expression = expression(4:length(expression));
    expression(1) = '~';
  end
  % Assume properties are made up letters, can include a hyphen
  % or a white space character, and there is no case sensitivity.
  pat = (lettersPattern + whitespacePattern + lettersPattern) ...
        | (lettersPattern + characterListPattern('-') ...
          + lettersPattern) ...
        | lettersPattern;
  % Extract properties from the logical expression and replace
  % them by function calls to test for membership.
  ex_props = extract(expression, pat);
  ind = 1;
  new_expression = '';
  for p = ex_props.'
    mod_prop = strcat('ismember(''', ...
        strrep(lower(p{1}), '-', ' '), ...
        ''', strrep(lower(matrix_properties{1}), ''-'', '' ''))');
    trunc_exp = expression(ind:end);
    % Find where the property is in the expression.
    prop_index = strfind(trunc_exp, p{1});
    % Take anything before the property and append the modified
    % version of the property.
    new_expression = strcat(new_expression, ...
        trunc_exp(1:prop_index(1)-1), ...
        mod_prop);
    % Move the index after the property that was replaced.
    ind = ind + prop_index(1) + length(p{1}) - 1;
  end
  new_expression = strcat(new_expression, expression(ind:end));

  % Find matrices whose properties satisfy the specified logical
  % expression.
  k = 0;
  for matrix_properties = properties.'
    k = k + 1;
    % Test if the expression is true for this matrix
    % and add it's ID.
    if eval(new_expression)
      IDs = [IDs; matrix_IDs{k}];
    end
  end
end

On lines 4 to 11 we replace the operators and/or/not by the corresponding MATLAB ones &/|/~. We then extract properties from the expression, making use of patterns, modify them to test for membership of the properties and place them back into the expression (lines 14 to 38). Finally, on lines 42 to 50 we use eval to run the Boolean expression as MATLAB code for every matrix, and return a list of suitable matrices.

In the example command anymatrix('p', 'integer and ill conditioned'), the expression 'integer and ill conditioned' is transformed internally to 'ismember('integer', strrep(lower(matrix_properties{1}), '-', ' ')) & ismember('ill conditioned', strrep(lower(matrix_properties{1}), '-', ' '))' which can then be evaluated using eval. Notice that Anymatrix replaces hyphens by white spaces to allow two ways to specify properties, and converts properties to lower case to avoid case sensitivity. This way, ill conditioned and Ill-conditioned, for example, are equivalent properties in Anymatrix.

Testing in Anymatrix

There are two ways to implement testing in Anymatrix. One way is to add a function in file test_run.m in the group folder which can be invoked by an appropriate Anymatrix command. Another way is to test matrices for their properties. This is provided in the directory testing/ in the root folder of Anymatrix.

It is worth noting that not all supported properties can be tested, so only a subset in the built-in collection are. Each property that is feasible to test has a corresponding test_<property>.m file. Functions in these files must return true or false, given a matrix. Anymatrix utilizes MATLAB’s unit testing framework and automatically generates function-based unit tests for every matrix in the collection. When a new matrix or a new property to an existent matrix is added, the unit testing framework picks that up automatically when run. The following is an example function-based unit test that is automatically generated by Anymatrix. It tests the properties of a set of the core/fourier matrices with arbitrary dimensions.

function test_core_fourier(testcase)
  A = anymatrix('core/fourier',3);
  anymatrix_check_props(A, 'core/fourier', testcase);
  A = anymatrix('core/fourier',5);
  anymatrix_check_props(A, 'core/fourier', testcase);
  A = anymatrix('core/fourier',8);
  anymatrix_check_props(A, 'core/fourier', testcase);
  A = anymatrix('core/fourier',10);
  anymatrix_check_props(A, 'core/fourier', testcase);
  A = anymatrix('core/fourier',15);
  anymatrix_check_props(A, 'core/fourier', testcase);
  A = anymatrix('core/fourier',24);
  anymatrix_check_props(A, 'core/fourier', testcase);
  A = anymatrix('core/fourier',25);
  anymatrix_check_props(A, 'core/fourier', testcase);
  A = anymatrix('core/fourier',30);
  anymatrix_check_props(A, 'core/fourier', testcase);
  A = anymatrix('core/fourier',31);
  anymatrix_check_props(A, 'core/fourier', testcase);
end

Above, the function anymatrix_check_props looks for files test_<property>.m and runs those tests that it finds on the given matrix:

for prop = P.'
  test_func_name = strcat('test_', ...
    strrep(strrep(lower(prop{1}), ...
    '-', '_'), ' ', '_'));
  if isfile(strcat(root_path, '/private/', test_func_name, '.m'))
    handle = str2func(test_func_name);
    verifyTrue(testcase, handle(M), ...
      strcat("Matrix ", matrix_ID, " is not ", prop{1}, "."));
  end
end

Using this, Anymatrix adds a level of almost automatic testing invoked by adding new matrices or properties to matrices. A good example of this is the MATLAB built-in gallery group, which we made available in Anymatrix. By appending each matrix with properties we can check that MATLAB’s gallery matrices have the expected properties.

Summary

Anymatrix is both a collection of matrices and a tool to organize them. It is worth noting that much of the infrastructure is not specific to organizing matrices and so it can be reused to organize into groups, search, and access through IDs any kind of objects appended with properties.

Anymatrix 1.0 was released in October 2021, but the development continues. We welcome contributions to the collection either in the form of remote groups that can be downloaded into Anymatrix given a git url (an example is matrices-mp-cosm group by X. Liu), or suggestions to include matrices or groups in the built-in collection. We also welcome reports of bugs or recommendations for improvements to the clarity, correctness and conciseness of the source code.

References

[1] N. J. Higham and M. Mikaitis. Anymatrix: An Extensible MATLAB Matrix Collection. MIMS EPrint 2021.16, Manchester Institute for Mathematical Sciences, The University of Manchester, UK. Oct. 2021 (to appear in Numer. Algorithms).

[2] N. J. Higham. Anymatrix: An Extensible MATLAB Matrix Collection. Nov. 2021.

[3] N. J. Higham and M. Mikaitis. Anymatrix: An Extensible MATLAB Matrix Collection. Users’ Guide. MIMS EPrint 2021.15, Manchester Institute for Mathematical Sciences, The University of Manchester, UK. Oct. 2021.