In the two earlier posts on this topic, we introduced an example system in which three inputs randomly taking values of 0, 1, or 2 are summed to produce an output value. In those cases, we assumed that the values 0, 1, and 2 were taken with equal probability, and calculated that we expected the output to be distributed as such:

For the purpose of this post, lets assume that the probability distributions of the three inputs are unknown to us, and that all we can do is measure the output. After conducting many measurements we find that the distribution of output values actually takes the form:

Assuming that our model is correct, it is clear that the underlying distribution of input parameters does not match our assumption of uniformity. Can we work backwards from this result to determine the likely input distributions?

In the previous post, we calculated the probability that the input would take on a specific value, given a specific measured output. Multiplying the conditional probability that we calculated by the probability of each output result from the original calculation, we get the joint probability for the input and the output taking on certain values:

If we now sum the joint probability of each of the input values for all output cases, we recover the original, underlying probability distribution:

In this case, the conditional probabilities were based upon the same distribution that gave rise to the output probabilities - our math was circular, and we ended up right where we started, to no real surprise. Now what happens if we look for the joint probabilities not using the output distribution predicted by out model, but using the 'measured' distribution in the second figure of this post? This is something entirely different, as we are drawing from two different probability distributions in creating the two sets of data that we are trying to merge:

When we sum, we find that the new 'input probability distribution' is:

I'll reveal here that I know what these values should be, as I made them up, and then used them to generate the output distribution. They are: p(0) = 16.67%, p(1) = 33.33%, p(2) = 50%. You can see now that the values we've calculated begin to approximate the actual input values. They aren't identical, because the pool of runs with output '5' (say) is composed of samples from the original distribution. We just multiply that pool by a factor which weights them closer to how they would be in a set of simulations conducted from the actual underlying distribution.

This implies that if we repeat the process, creating a new set of Monte Carlo runs based on our current estimate of the underlying input distribution, we might converge (within some error determined by the number of runs in each Monte Carlo iteration) to the underlying distribution. In the following, instead of using the original 27 run exhaustive list to create my initial guess of the distributions, I've used 50,000 runs in a Monte Carlo simulation, and you'll notice the outputs of the first iteration differ from the above.

After 1, 5, and 20 iterations of this process, our new guesses as to the underlying distribution are as follows, in the rightmost column:

The largest gains are realized in the first iteration. The following graph shows how the values adjust over 50 runs:

In fact the noise associated with the Monte Carlo Simulation means there is a bound on how close we can get to the actual value before noise overwhelms any further convergence, as you can see in this magnification of the same chart:

To decrease the noise, we can adjust the number of runs in each Monte Carlo simulation.

In future posts, we'll expand this to the continuous domain, look at what happens when the three inputs have different underlying probability distributions, and then go on to look at applying the method to more complex systems.