James P Houghton

James Houghton - Monte Carlo Analysis


Monte Carlo Analysis

15 Nov 2012

A common problem in systems analysis is to determine the likely output of a system based upon uncertain parameters that feed into that system:
For simplicity, lets assume that all the 'system model' does is sum three inputs, such that:
   Output = Input1 + Input2 + Input3
We'll also specify that each of the inputs can have any of three values: 0, 1, or 2, and that it has each of those values with equal likelihood:
What is the outcome likely to be? In this case, there are only 3^3 = 27 possible combinations (three inputs, each of which takes on three values), and so we can rather easily try all of them:
The outputs are numbers between 0 and 6, and each value shows up a different number of times. From this we can calculate the probability that (for example) the output will be 5 as the number of times that 5 shows up in the output (3), divided by the total number of cases (27):
So we see that there is a 3/27 ~ .11 = 11% chance that the output will be 5. 
Now imagine that instead of limiting our input to three values: 0, 1, and 2, we let them float for any value between 0 and 2, with uniform probability density. 
If we're interested in knowing the probability of the output in increments of size 0.1, we would need to perform 3^(2/.1) = 3,486,784,401 calculations to compare every possible combination of values, and then count the frequencies of the resulting outputs. This would take a long time. 

Instead of using the brute force algorithm, we instead can use a Monte Carlo simulation, which doesn't require us to test every possible combination of input, and perform those 3.4 trillion calculations. Instead, we take a random sample of the possible inputs, say 2000, and put those through our model. This is still a large number, but much less than 3.4 trillion. If our samples are truly random based on the probability that we have assigned to the inputs, then when we calculate our result, the output probability distribution will approximate the true distribution which it's too costly for us to compute.

Here is the simulation laid out in spreadsheet form. As above, each row represents a possible combination of inputs, and the output is the sum of the three. 

One addition in this spreadsheet is the column titled 'buckets'. This column is a the 'output' column rounded to the nearest tenth. This allows us to create a meaningful histogram of the probability distribution of the output:
From this we can approximate the probability density function by dividing each bucket by .1 - its width on the chart:
The probability density function being the continuous quantity that the Monte Carlo simulation is designed to help us approximate.  As the number of simulations we include in our Monte Carlo analysis increases, the better the approximation becomes. With 1 million simulations, our result begins to converge on something like a Gaussian distribution:







© 2016 James P. Houghton