## SIAM CSE19 Minisymposium on “Advances in Analyzing Floating-point Errors in Computational Science”

by Pierre Blanchard, Nick Higham, and Theo Mary

Last February the SIAM Computational Science and Engineering (CSE19) conference took place in Spokane, WA, USA. We organized a two-part minisymposium on recent Advances in Analyzing Floating-point Errors in Computational Science (see links to part 1 and part 2). Below is the list of the eight talks along with the slides which we have made available.

## Simulating Low Precision Floating-Point Arithmetics

In earlier blog posts, I wrote about the benefits of using half precision arithmetic (fp16) and about the problems of overflow and underflow in fp16 and how to avoid them. But how can one experiment with fp16, or other low precision formats such as bfloat16, in order to study how algorithms behave in these arithmetics? (For an accessible introduction to fp16 and bfloat16 see the blog post by Nick Higham.)

As of now, fp16 is supported by several GPUs, but these are specialist devices and they can be very expensive.  Moreover, architectures that support bfloat16 have not yet not been released. Therefore software that simulates these floating-point formats is needed.

In our latest EPrint, Nick Higham and I investigate algorithms for simulating fp16, bfloat16 and other low precision formats. We have also written a MATLAB function chop that can be incorporated into other MATLAB codes to simulate low precision arithmetic.  It can easily be used to study the effect of low precision formats on various algorithms.

Imagine a hypothetical situation where the computer can just represent integers. Then the question is how do we represent numbers like 4/3? An obvious answer would be to represent it via the integer closest to it, 1 in this case. However, one will have to come with a convention to handle the case where the number is in the centre. Now replace the integer in the example with floating-point numbers, and a similar question arises. This process of converting any given number to a floating-point number is called  rounding. If we adopt a rule where we choose the closest floating-point number (as above), then we formally call it as ‘round to nearest’. There are other ways to round as well, and different rounding modes can yield different results for the same code. However meddling with the parameters of a floating-point format without a proper understanding of their consequences can be a recipe for disaster. Cleve Moler in his blog on sub-normal numbers makes this point by warning ‘don’t try this at home’. The MATLAB software we have written provides a safe environment to experiment with the effects of changing any parameter of a floating-point format (such as rounding modes and support of subnormal numbers) on the output of a code. All the technical details can be found in the Eprint and our MATLAB codes.

## Our Alumni – Edvin Hopkins

In this blog post, we asked one of our alumni, Edvin Hopkins, a few questions about his time with the Numerical Linear Algebra Group.

Please can you introduce yourself and tell us a bit about your experience before attending University of Manchester?

I obtained my BA in Mathematics from the University of Cambridge in 2005 and remained there for a few more years to do a PhD in numerical relativity. My association with the University of Manchester began in 2010, when I joined the NLA group as a KTP Associate, working on a joint project with NAG to implement some of the NLA group’s matrix function algorithms for the NAG Library.

Why did you choose to work with the University of Manchester?

The project I was involved in was a great opportunity to bridge the gap between academia and industry and to work with world leaders in their fields.

How did you find Manchester?

Well, I’m still there! It has really grown on me in the past few years, and is a great place to work.