In the last two posts on this topic, we showed that while playing the lottery is a bad idea, its a worse idea at some times than at others. In this post we'll make a model of the Powerball lottery system and get a sense for how frequently drawings will fall into each category.
We'll start by looking at how the value of the jackpot changes - increasing when jackpots are not won, and being reset when somebody draws those special numbers:
When the jackpot is won, this relationship is irrelevant - the value of the jackpot is simply reset to $40M. The rate of increase and payout are then calculated as:
if (Jackpot Won)
payout = Value of Jackpot
increase = $40M
payout = 0
increase = .4907*Dollar Sales - $5.0307M
Because each ticket sells for $2, we map revenue directly to ticket sales. From the last post we know that we can estimate sales from the current jackpot amount (I won't rewrite that equation here), and so we can complete a reinforcing feedback loop:
previous post, the odds of a jackpot being won likewise increase with the number of tickets sold - equal to one minus the probability of there being zero winners. This creates a balancing loop which is responsible for keeping the value of the jackpot from climbing too high:
We also see that while the calibration data climbed above $500M, the simulation here stayed below $300M. Is this a fluke of randomness, or evidence that our model isn't capturing behavior well? To determine this we'll run our simulation over a very large number of drawings, and then compare the distribution of prizes. The following chart shows the value of jackpots for each drawing grouped into $50M buckets, for our calibration data (Jan 14, 2012 - Jan 12, 2013) our model (run for 8000 samples) and for every drawing after our calibration series.
When we look at the 'Measured Actual' data following the model calibration period (now getting into the realm of prediction) we again see more variance. This is due to two factors. The first (and likely largest) is due to the small sample size. We expect our model to predict the distribution over the long run, and 19 drawings isn't long enough of a run for us to get statistically meaningful distributions. The second factor is that the underlying conditions which drive behavior may have changed, and our empirically derived expressions for the number of tickets sold as a function of the jackpot size, or the behavior of the lottery organizers in setting that jackpot may have lost accuracy. When we are able to collect more data, we'll be able to see if this variance is entirely due to the small sample size, or if there is something more fundamental going on.