Take our example from before, in which three random integers between 0 and 2 inclusive are summed to find an output. We found the probability of the answer being any of 7 possible values:
These then are the probabilities for the input taking a certain value, under various conditions on the output:
The conditional probability distribution for each output value looks like this:
Lets try mapping this idea onto the continuous domain example that we considered in the previous post. In that case, you remember, we specified that the distribution of the input function was a uniform distribution between 0 and 2, with the same system. Lets assume that we've measured the output via some terribly clever means to be between 2.5 and 3. Bucketing the results into intervals of tenths we see:
In a plot, we can see this is significantly different from the original, assumed uniform distribution for all possible cases:
We're getting a fair amount of noise in this chart, because having selected a subset of our Monte Carlo runs, the sample size that creates this chart is rather small. When we increase the number of runs to 1 million, the result is significantly cleaner:
The drop-offs at either end are an artifact of rounding into buckets, (so -.05 to .05 naturally has half as many components as .05 to .15, even in what would have otherwise been an even distribution) instead of assigning buckets more explicitly.
In a future post, I'd like to apply these analyses to the model we created about views on YouTube, and see if we can get a statistical baseline for how people share videos online.
Also, I'd like to see if we can recover an unknown input distribution by iterating our model with 'observed' values. I'd then like to see if we can make sense of what this process would look like if there was a distribution of confidence in our measured output.