How big can an error be when we estimate something?

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

Often, when you make an estimation based on many assumptions, people say "There might be errors in all your assumptions, and the error on the result, being the sum of all these errors, is going to be huge".

In reality, errors compensate each others.  You might overestimate one variable, but will underestimate the next one. Unless you are biased, the error will grow like a drunken wanderer.

Say we want to estimate the number N of something. Number of candies eaten by children in the world. Or piano tuners in Chicago. Or whatever.

To estimate N, we multiply estimated values e_i of the factors which contribute to N, whose real (unknown) values is a_i. For estimating the candies, we might have the number of people in the world, fraction of children, sugar-producing crops and so on.

N = a_1 \cdot a_2 \cdot ... \approx e_1 \cdot e_2 ...

In the end, will compute our estimate by multiplying all e_i.

Now, let's say that you are really bad in estimating, and you never get the right value. All e_i are wrong by a factor 2 –sometimes your estimate e_i is the double, sometimes is one half of the actual value a_i.

Now you do what any good engineer would have done before the advent of pocket calculator when had to multiply numbers –you sum logarithms:

log(N) = \sum_i log(a_i) \approx \sum_i log(e_i)

(\sum means "sum".) But we said your estimates are

e_i = a_i \cdot random(2, 0.5)

Or

log(e_i) = log(a_i) + random(+1, -1)

Approximating log(2) to 1.

This allows us to separate the errors from the estimates and write

log(N) \approx \sum log(a_i) + \sum random(+1, -1)

log(N) \approx log(N) + log(\sigma_{final})

where \sigma_{final} is the error you'll get at the end of the estimate.

The logarithm of the final error, \sigma_{final} , actually diffuses quite slowly. Like drunken wanderers who can only walk on a line, will make one step in one direction, than two steps in the opposite direction, and so on. After S steps, 70% of those drunken wanderers are on average no more than \sqrt{S} steps away from their starting point.

This means that 70% of the times the log of your estimation error is not bigger than \sqrt{S}*log(\sigma), where \sigma is the average (estimated) error factor, which we initially assumed to be 2. Or

\sigma_{final} = \sigma^{\sqrt{S}}

With S number of assumptions you made, \sigma the average error for each factor, \sigma_{final} the final estimation error.

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *