Stochastic Rounding Has Unconditional Probabilistic Error Bounds

In the IEEE standard 754 for binary floating-point arithmetic, four rounding modes are defined.

• Round to nearest.
• Round towards $0$.
• Round towards $+\infty$.
• Round towards $-\infty$.

Recently stochastic rounding, an old idea that dates back to the beginning of the digital computer era, has gained popularity, most notably in deep learning. While the rounding modes defined in the IEEE standard are deterministic, stochastic rounding is inherently random. We can define two modes of stochastic rounding. Consider the figure below, where we have a real number $x$ and adjacent floating-point numbers $a$ and $b$. In what we call mode 1 stochastic rounding, we round $x$ to either $a$ or $b$ with equal probability. In mode 2 stochastic rounding, we round to $a$ (or $b$) with probability proportional to $1$ minus the distance of $x$ from $a$ (or $b$). So in the example shown, for mode 2 we are more likely to round to $b$ than to $a$.

In our recent EPrint Stochastic Rounding and its Probabilistic Backward Error Analysis, Nick Higham, Theo Mary and I generalized previous probabilistic error analysis results [SIAM J. Sci. Comput., 41 (2019), pp. A2815–A2835] to stochastic rounding.

We show that stochastic rounding (specifically mode 2) has the property of producing rounding errors $\delta_k$ that are random variables satisfying

• mean zero: $\mathbb{E}(\delta_k) = 0$,
• mean independent: $\mathbb{E}(\delta_k \mid \delta_{k-1}, \dots, \delta_1) = \mathbb{E}(\delta_k)$.

Here, $\mathbb{E}$ denotes the expectation. A key consequence of these results is that we can replace the worst case error bound proportional to $\gamma_n = nu + O(u^2)$, ubiquitous in backward error analyses, by a more informative probabilistic bound proportional to $\widetilde{\gamma}_n = \sqrt{n}u + O(u^2)$. What is a rule of thumb for round to nearest becomes a rule for stochastic rounding: it is proved that our rounding errors satisfy the above properties.

In the figure below, we compute the inner product $s = x^Ty$ of vectors sampled uniformly from $[0,1]$ in both round to nearest and stochastic rounding in IEEE half precision arithmetic. Shown is the backward error for each value of $n$ and the bounds $\gamma_n$ and $\widetilde{\gamma}_n$.

For increasing problem size $n$, with round to nearest the error no longer satisfies the $\sqrt{n}u$ bound. This is due to the phenomenon we call stagnation, and low precisions are particularly susceptible to it. As we compute recursively $s_i = s_{i-1} + x_iy_i$, eventually the partial sum $s_{i-1}$ can grow so large that the update $x_iy_i$ is less than half of the spacing of floating-point numbers around $s_{i-1}$ and under round to nearest the partial sum does not increase. This means we produce negative rounding errors, which are of course not mean zero. Stochastic rounding avoids this issue by rounding randomly. In fact we can prove that $\mathbb{E}(\widehat{s}) = s$, that is, the expected value of the computed result under stochastic rounding is the exact result.