Loading web-font TeX/Math/Italic

Tuesday, May 3, 2016

Numerical pitfalls in computing variance

One of the most common tasks in statistical computing is computation of sample variance. This would seem to be straightforward; there are a number of algebraically equivalent ways of representing the sum of squares S, such as S = \sum_{k=1}^n ( x_k - \bar{x})^2
or S = \sum_{k=1}^n x_k^2 + \frac{1}{n}\bar{x}^2
and the sample variance is simply S/(n-1).

What is straightforward algebraically, however, is sometimes not so straightforward in the floating-point arithmetic used by computers. Computers cannot represent numbers to infinite precision, and arithmetic operations can affect the precision of floating-point numbers in unexpected ways.