Stochastic Rounding Has Unconditional Probabilistic Error Bounds

In the IEEE standard 754 for binary floating-point arithmetic, four rounding modes are defined.

  • Round to nearest.
  • Round towards 0.
  • Round towards +\infty.
  • Round towards -\infty.

Recently stochastic rounding, an old idea that dates back to the beginning of the digital computer era, has gained popularity, most notably in deep learning. While the rounding modes defined in the IEEE standard are deterministic, stochastic rounding is inherently random. We can define two modes of stochastic rounding. Consider the figure below, where we have a real number x and adjacent floating-point numbers a and b. In what we call mode 1 stochastic rounding, we round x to either a or b with equal probability. In mode 2 stochastic rounding, we round to a (or b) with probability proportional to 1 minus the distance of x from a (or b). So in the example shown, for mode 2 we are more likely to round to b than to a. stoch_round_fig.jpg

In our recent EPrint Stochastic Rounding and its Probabilistic Backward Error Analysis, Nick Higham, Theo Mary and I generalized previous probabilistic error analysis results [SIAM J. Sci. Comput., 41 (2019), pp. A2815–A2835] to stochastic rounding.

We show that stochastic rounding (specifically mode 2) has the property of producing rounding errors \delta_k that are random variables satisfying

  • mean zero: \mathbb{E}(\delta_k) = 0,
  • mean independent: \mathbb{E}(\delta_k \mid \delta_{k-1}, \dots, \delta_1) =   \mathbb{E}(\delta_k).

Here, \mathbb{E} denotes the expectation. A key consequence of these results is that we can replace the worst case error bound proportional to \gamma_n = nu + O(u^2), ubiquitous in backward error analyses, by a more informative probabilistic bound proportional to \widetilde{\gamma}_n = \sqrt{n}u + O(u^2). What is a rule of thumb for round to nearest becomes a rule for stochastic rounding: it is proved that our rounding errors satisfy the above properties.

In the figure below, we compute the inner product s = x^Ty of vectors sampled uniformly from [0,1] in both round to nearest and stochastic rounding in IEEE half precision arithmetic. Shown is the backward error for each value of n and the bounds \gamma_n and \widetilde{\gamma}_n. 0125h.jpg

For increasing problem size n, with round to nearest the error no longer satisfies the \sqrt{n}u bound. This is due to the phenomenon we call stagnation, and low precisions are particularly susceptible to it. As we compute recursively s_i = s_{i-1} + x_iy_i, eventually the partial sum s_{i-1} can grow so large that the update x_iy_i is less than half of the spacing of floating-point numbers around s_{i-1} and under round to nearest the partial sum does not increase. This means we produce negative rounding errors, which are of course not mean zero. Stochastic rounding avoids this issue by rounding randomly. In fact we can prove that \mathbb{E}(\widehat{s}) = s, that is, the expected value of the computed result under stochastic rounding is the exact result.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s