## Tuesday, May 3, 2016

### Numerical pitfalls in computing variance

One of the most common tasks in statistical computing is computation of sample variance. This would seem to be straightforward; there are a number of algebraically equivalent ways of representing the sum of squares $S$, such as $S = \sum_{k=1}^n ( x_k - \bar{x})^2$ or $S = \sum_{k=1}^n x_k^2 + \frac{1}{n}\bar{x}^2$ and the sample variance is simply $S/(n-1)$.

What is straightforward algebraically, however, is sometimes not so straightforward in the floating-point arithmetic used by computers. Computers cannot represent numbers to infinite precision, and arithmetic operations can affect the precision of floating-point numbers in unexpected ways.

Consider the numbers .1 and .3/3. These two numbers are equal. However,
.1 - .3/3

## [1] 0.00000000000000001388



is not exactly 0, as one would expect it to be (for more, see "Falling into the Floating
Point Trap", Chapter 4 in the "R Inferno" by Patrick Burns. Multiple ways of computing the variance that are algebraically equivalent do not necessarily yield equal answers in software such as R, and some ways are better than others.

In a series of posts, John D. Cook shows that the seemingly reasonable, commonly used second method above, which he calls the "sum of squares" method, can be extremely unstable in certain circumstances, even giving impossible negative values. He also discusses how to compute the sample variance in a numerically stable way using Welford's method. Both of these posts are well worth reading.

When I read them, I thought two things. First, I was reminded that I use [used; this was written some time ago] the "sum of squares" method in the BayesFactor package. Secondly, I thought I would not be affected by the problem, because I represent numbers as logarithms internally for numerical stability and ease of division. Logarithms make many things easier: very large and very small numbers become easier to work with; exponentiation becomes multiplication; and multiplication and division become addition and subtraction. The tricky part of working with logarithms is addition and subtraction. If we have two numbers, $\exp(a)$ and $\exp(b)$ represented by their logarithms $a$ and $b$, and we want to know the logarithm of their sum, we can make use of the identities
$\begin{eqnarray*} \log(\exp(a) + \exp(b)) = a + \log(1 + \exp(b - a))\\ \log(\exp(a) - \exp(b)) = a + \log(1 - \exp(b - a)) \end{eqnarray*}$
Now arithmetic with $a$ and $b$ is addition and subtraction, and we can use accurate floating point approximations of $\log(1+\exp(x))$ and $\log(1-\exp(x))$.

### When logarithms don't help

But I wasn't really sure whether I would be affected by the instability of the "sum of squares" method, so I decided to check. It turns out, representing numbers logarithmically doesn't necessarily help. In order to demonstrate this easily, I created an R S4 class that eases arithmetic on logarithmically-represented values. First, we load necessary libraries/files:
# Install the BayesFactor and devtools packages, if you don't already have them

# Load my S4 class for representing real numbers with logarithms
# and performing arithmetic on them
# See the code at https://gist.github.com/richarddmorey/3c77d0065983e31241bff3807482443e
devtools::source_gist('3c77d0065983e31241bff3807482443e')

# set random seed so results are reproducible
set.seed(2)


[Click here to view the R file you'll be sourcing above]

To see the S4 class in action, we need to generate some numbers that are logarithmically-represented. The variables x and y below are equal to $\exp(1)=2.718$ and $\exp(2)=7.389$, respectively. The modulo argument gives the log-represented magnitude of the number, and the sign argument gives the sign (with 1L meaning the integer representation of 1):
x = logRepresentedReal(modulo = 1, sign = 1L)
y = logRepresentedReal(modulo = 2, sign = 1L)

We can add the two numbers together, for instance:
x + y

##  10.10734

Although the result does not look logarithmically-represented, we can verify that it is using the str function:
str( x + y )

## Formal class 'logRepresentedReal' [package ".GlobalEnv"] with 2 slots
##   ..@ modulo: num 2.31
##   ..@ sign  : int 1

The result is of class logRepresentedReal, and the modulo slot tells us $\log(x+y)$. With the arithmetic on the logarithmically-represented numbers defined using the logRepresentedReal class, we can test whether our logarithms help stabilize the estimate of the variance. Following Cook, we will sample values from a uniform distribution, making use of the fact that if $z$ has a uniform distribution, then $−\log(z)$ has an standard exponential distribution:
runif2 = function(n){
# Sample log values from exponential distribution
x = -rexp(n)
# represent all values logarithmically in a list
lapply(x, logRepresentedReal, sign = 1L)
}

n = 100
z = runif2(n)

We sampled $n=100$ values from a uniform distribution. We can now compute the variance in several ways. The first way is to use the “sum of squares” method on the exponentiated values:
# Sum of squares method
var.sumsq.exp = function(z)
{
n = length(z)
z = sapply(z, as.numeric)
(sum(z^2) - n*mean(z)^2)/(n-1)
}

var.sumsq.exp(z)

## [1] 0.07419988

This presents no problem, since our uniformly-distributed values are rather moderate. We now use Welford’s method on the logarithmically-represented values to compute the variance:
var.welford <- function(z){
n = length(z)
M = list()
S = list()
M[[1]] = z[[1]]
S[[1]] = 0

for(k in 2:n){
M[[k]] = M[[k-1]] + ( z[[k]] - M[[k-1]] ) / k
S[[k]] = S[[k-1]] + ( z[[k]] - M[[k-1]] ) * ( z[[k]] - M[[k]] )
}
return(S[[n]] / (n - 1))
}

var.welford(z)

##  0.07419988

And finally, we can use the “sum of squares” method on the logarithmically-represented values:
var.sumsq = function(z){
n = length(z)
zsqr = sapply(z, function(x) x^2)

sumz = 0
sumz2 = 0
for(k in 1:n){
sumz = sumz + z[[k]]
sumz2 = sumz2 + zsqr[[k]]
}
mnz = sumz/n
ssz = sumz2 - n * mnz^2
return(ssz / (n-1))
}

var.sumsq(z)

##  0.07419988

Again, this presents no problem, since our uniformly-distributed values are moderate. So far, we see no signs of numerical instability, but none are expected. As Cook did in his example, we can add a very large number — in this case, one billion — to all sampled values. This makes the variance quite small compared to the mean, and would be expected to make the sum of squares estimates unstable.
const = 1e9
z = sapply(z, function(x) x + const)

var.sumsq.exp(z)

## [1] -165.4949

var.welford(z)

##  0.07419973

var.sumsq(z)

##  35886

Notice that both the sum of square estimates fail; the logarithmic representation used by var.sumsq does not help the numeric stability. The Welford method, on the other hand, yields an accurate value.

### Conclusion

There are many ways of combating numerical instability. Representing numbers as logarithms is one important method. Although representing numbers as logarithms is effective at combating numerical instability from some sources, it does not necessarily help in all cases. The superiority of the Welford method of computing variance, even when numbers are represented logarithmically, shows this clearly.

1. You left out the obvious tool to compare against: stats::sd, which is just sqrt(stats::var). I don't have the source handy, so perhaps you could check out .Call(C_cov, x, y, na.method, FALSE) which is the guts of the 'var' function?

1. Hi cellocgw, since am interested mainly in working with logarithms directly, and stats:sd function doesn't do that (internally, I don't want to exponentiate ever because I would end up getting infinities). But using stats::var on the exponentiated values is very accurate. Not sure what method is used, though.

2. TESTIMONY FROM ANSU{SAUDI ARABIA}
I want to share my testimony about my HIV virus that was cured by a herbalist Dr Ogugu with his roots and herbs,Since last 4 years I have being a HIV AID patient. I never think I will live long again and am so grateful about Dr Ogugu who cured my HIV AIDS last week. I was in pains so I told one of my friend and she told me that there is a herbalist that can cure my HIV with his roots and herbs. do contact him now she read someone testimony from a website online on how Dr Ogugu cure her from HIV AIDS and she also collected Dr Ogugu website and email. She gave me the contact info, I emailed him and he prepared herbal medicine for me with his roots and herbs, after two weeks he ask me to go for a test. Which I did, when the doctor told me that am now a HIV negative patient I couldn't believe myself I went to see another doctor and the result was still the same, Thank God everything is fine, I'm cured by Dr Ogugu with his herbal medicine, I'm very thankful to Dr Ogugu and very happy with my family.Dr Ogugu can also offer any types of help like Curing of all types of Diseases and lot's more.

Whatsapp/ call: +2348081594908
Email: Oguguspellcaster@gmail.com
Web Page: https://oguguspellcaster.wixsite.com/spell

2. This comment has been removed by a blog administrator.

3. This comment has been removed by a blog administrator.

4. Nice blog posts - thanks for giving the pitfalls in computing variance of the World.

Discount Airline Tickets

5. This is a good website for my guy. I have to show it to him :D

6. Nice post!! Thanks for sharing.
dermology hair removal cream

7. I like this site, this is very informative article!! Please Visit:

cloud pos
point of sales softwares
pos app

8. Looks fairly complicated, but thanks for such a clear explanation. By the way, I'm planning to start the implementation of Microsoft Dynamics erp software and I just found this offer: https://ax-dynamics.com/microsoft-dynamics-ax. Looks pretty good!

9. COEPD LLC- Center of Excellence for Professional Development is the most trusted online training platform to global participants. We are primarily a community of Business Analysts who have taken the initiative to facilitate professionals of IT or Non IT background with the finest quality training. Our trainings are delivered through interactive mode with illustrative scenarios, activities and case studies to help learners start a successful career. We impart knowledge keeping in view of the challenging situations individuals will face in the real time, so that they can handle their job deliverables with at most confidence.

http://coepd.us/

10. Nice and very informative blog post you shared with us.
Online Assignment Help, UK assignment Help.

11. Irrespective of receiving daily oral or future injectable depot therapies, these require health care visits for medication and monitoring of safety and response. If patients are treated early enough, before a lot of immune system damage has occurred, life expectancy is close to normal, as long as they remain on successful treatment. However, when patients stop therapy, virus rebounds to high levels in most patients, sometimes associated with severe illness because i have gone through this and even an increased risk of death. The aim of “cure”is ongoing but i still do believe my government made millions of ARV drugs instead of finding a cure. for ongoing therapy and monitoring. ARV alone cannot cure HIV as among the cells that are infected are very long-living CD4 memory cells and possibly other cells that act as long-term reservoirs. HIV can hide in these cells without being detected by the body’s immune system. Therefore even when ART completely blocks subsequent rounds of infection of cells, reservoirs that have been infected before therapy initiation persist and from these reservoirs HIV rebounds if therapy is stopped. “Cure” could either mean an eradication cure, which means to completely rid the body of reservoir virus or a functional HIV cure, where HIV may remain in reservoir cells but rebound to high levels is prevented after therapy interruption.Dr Itua Herbal Medicine makes me believes there is a hope for people suffering from,Parkinson's disease,Schizophrenia,Cancer,Scoliosis,Fibromyalgia,Fluoroquinolone Toxicity
Syndrome Fibrodysplasia Ossificans Progressiva.Fatal Familial Insomnia Factor V Leiden Mutation ,Epilepsy Dupuytren's disease,Desmoplastic small-round-cell tumor Diabetes ,Coeliac disease,Creutzfeldt–Jakob disease,Cerebral Amyloid Angiopathy, Ataxia,Arthritis,Amyotrophic Lateral Sclerosis,Alzheimer's disease,Adrenocortical carcinoma.Asthma,Allergic diseases.Hiv_ Aids,Herpes,Inflammatory bowel disease ,Copd,Diabetes,Hepatitis,I read about him online how he cure Tasha and Tara so i contacted him on drituaherbalcenter@gmail.com even talked on whatsapps +2348149277967 believe me it was easy i drank his herbal medicine for two weeks and i was cured just like that isn't Dr Itua a wonder man? Yes he is! I thank him so much so i will advise if you are suffering from one of those diseases Pls do contact him he's a nice man.

13. i think this post of yours lack some great info man.
Know about Microsoft Dynamic 365.

14. Lankatricks.com is the best online place in the Sri Lanka for finding educational resources such as Government exam papers and Knowledge like SLAS Exams, Banking Exam, Gread five scholership exam, O/L Exam, A/L Exam, University Exams, Technical knowledge exam etc. Here, Can get Exam papers, Pass Papers, Model Papers,Syllubus and many more other educational resources.

15. I am bold enough among many others to state that there is now a potent cure to this sickness but many are unaware of it. I discovered that I was infected with the virus 3 months ago, after a medical check-up. My doctor told me and I was shocked, confused and felt like my world has crumbled. I was dying slowly due to the announcement of my medical practitioner but he assured me that I could leave a normal life if I took my medications (as there was no medically known cure to Herpes). I went from churches to churches but soon found that my case needed urgent attention as I was growing lean due to fear of dying anytime soon. In a bid to look for a lasting solution to my predicament, I sought for solutions from the herbal world. I went online and searched for every powerful trado-medical practitioner that I could severe, cos I heard that the African Herbs had a cure to the Herpes syndrome. It was after a little time searching the web that I came across one Dr Itua(A powerful African Herbal Doctor), who offered to help me at a monetary fee. I had to comply as this was my final bus-stop to receiving a perfect healing. My last resolve was to take my life by myself, should this plan fail. At last it worked out well. He gave me some steps to follow and I meticulously carried out all his instructions. Last month, to be precise, I went back to the hospital to conduct another test and to my amazement, the results showed that negative,Dr Itua Can As Well Cure The Following Desease…Cancer,Hiv,Herpes, Hepatitis B,Liver Inflammatory,Diabetis,Fribroid,,Non Hodgkin Lymphoma,Skin Cancer,Uterine Cancer,Prostate Cancer Dercum,Infertility,fibromyalgia,Get Your Ex Back,Als,SYPHILLIS,Genetic disease,Epilepsy, Parkinson's disease..You can free yourself of this Herpes virus by consulting this great African Herbal Doctor via this e-mail: drituaherbalcenter@gmail.com or call and whatsapp him on +2348149277967 He will help you and his herb medication is sure. he has the cure on all disease .You can talk to me on INSTAGRAM..tashamoore219....

16. I have long felt a special connection with Herbal Medicine. For one thing is natural, Charlie we went to the same small college in Southern California — Claremont Men’s College — though he dropped out and ultimately enrolled at the Julliard School of the Performing Arts in New York. Had he stayed at Claremont, he would have been a senior the year I started there; I have often kidded that I was the reason he left when he finds out I had herpes. So my life have been lonely like hell all day I couldn't bear the outbreak pain, Tasha then introduce me to Dr Itua who use his herbal medicine to cure her two weeks of drinking it. I place an order form him and he delivers to my post office then I picked it up and used for two weeks all my sore was healed completely no more outbreak I'm honestly telling you this man is a great man, I trust him and his herbal medicine so much I'm sharing this to show my appreciation also to let sick people know there is hope with Dr Itua Herbal Made Medicine.Dr Itua Contact Email.drituaherbalcenter@gmail.com,Whatsapp.+2348149277967
He cures.
Herpes,
STROKE.
Hepatitis
H.P.V TYPE 1 TYPE 2 TYPE 3 AND TYPE 4. TYPE 5.
HIV
SYPHILIS.
Liver/Kidney Inflammatory
Epilepsy

17. TESTIMONY FROM MARY SMITH{UK}
I am here to give my testimony about Dr Ogugu who helped me in my life, i want to inform the public how i was cured from (HERPES SIMPLEX VIRUS) by Dr Ogugu,i visited different hospital but they gave me list of drugs like Famvir, Zovirax, and Valtrex which is very expensive to treat the symptoms and never cured me. I was browsing through the Internet searching for remedy on HERPES and i saw comment of people talking about how Dr Ogugu cured them. when i contacted him he gave me hope and send a Herbal medicine to me that i took and it seriously worked for me, am a free person now without problem, my HERPES result came out negative. I pray for you Dr Ogugu am cured you can also get your self cured my friends if you really need my doctor help, you can reach him now:

Whatsapp/ call: +2348081594908
Email: Oguguspellcaster@gmail.com
Web Page: https://oguguspellcaster.wixsite.com/spell