James P Houghton

# Playing the Lottery 3 - Modeling the System

23 Mar 2013

In the last two posts on this topic, we showed that while playing the lottery is a bad idea, its a worse idea at some times than at others. In this post we'll make a model of the Powerball lottery system and get a sense for how frequently drawings will fall into each category.

We'll start by looking at how the value of the jackpot changes - increasing when jackpots are not won, and being reset when somebody draws those special numbers:
The increase in jackpot value can be reasonably approximated as a function of ticket sales for the previous drawing. In the figure below, we plot the difference between successive non-winning jackpots vs ticket sales, and fit a line to the data. The fit is rather strongly dominated by the outlying point at about \$550M, but maintains a reasonable R^2 of .97.

When the jackpot is won, this relationship is irrelevant - the value of the jackpot is simply reset to \$40M. The rate of increase and payout are then calculated as:

if (Jackpot Won)
payout = Value of Jackpot
increase = \$40M
else
payout = 0
increase = .4907*Dollar Sales - \$5.0307M

Because each ticket sells for \$2, we map revenue directly to ticket sales. From the last post we know that we can estimate sales from the current jackpot amount (I won't rewrite that equation here), and so we can complete a reinforcing feedback loop:
This loop is responsible for the exponential shape we see in jackpot growth before a win. As we saw in a previous post, the odds of a jackpot being won likewise increase with the number of tickets sold - equal to one minus the probability of there being zero winners. This creates a balancing loop which is responsible for keeping the value of the jackpot from climbing too high:
And that's it. All of the randomness of a lottery summarized with a handful of parameters and empirically determined relationships. Can we really say that this model is an accurate description of reality? Let's qualitatively compare the time-history output of the model against the calibration data
First of all, its obvious that because our model is statistical, we don't expect to match the exact time-series of the data. The randomness of each drawing takes care of that for us. Interestingly, the real world deviates from the exponential growth curves that our model predicts: the jackpot grows more quickly at first than our model predicts, and there are some weird kinks in the curves. The difference is a result of the simplifying assumptions we made in our model. In reality, it seems that lottery organizers have a preference for offering jackpots that are large, round numbers (\$50M, \$60M, \$70M, instead of \$53.843, etc).

We also see that while the calibration data climbed above \$500M, the simulation here stayed below \$300M. Is this a fluke of randomness, or evidence that our model isn't capturing behavior well? To determine this we'll run our simulation over a very large number of drawings, and then compare the distribution of prizes. The following chart shows the value of jackpots for each drawing grouped into \$50M buckets, for our calibration data (Jan 14, 2012 - Jan 12, 2013) our model (run for 8000 samples) and for every drawing after our calibration series.
Luckily for us, the model and calibration data match one another very well, and this gives us more confidence that the difference in peak values in our qualitative estimate above was due to chance.

When we look at the 'Measured Actual' data following the model calibration period (now getting into the realm of prediction) we again see more variance. This is due to two factors. The first (and likely largest) is due to the small sample size. We expect our model to predict the distribution over the long run, and 19 drawings isn't long enough of a run for us to get statistically meaningful distributions. The second factor is that the underlying conditions which drive behavior may have changed, and our empirically derived expressions for the number of tickets sold as a function of the jackpot size, or the behavior of the lottery organizers in setting that jackpot may have lost accuracy. When we are able to collect more data, we'll be able to see if this variance is entirely due to the small sample size, or if there is something more fundamental going on.