James P Houghton

James Houghton - Modeling Social Trends Online 0


Modeling Social Trends Online 0

03 Oct 2012

The patterns in which social trends spread on the internet (and in culture at large) can be described as similar to the way diseases spread through a population. A few members of a susceptible population encounter the pathogen (or the meme) and through contact with other members of the population spread the disease. Eventually the population develops an immunity and the spread is curtailed. Here's a disease model modified to represent the spread of memes:
  Have not seen = INTEG(-Watching)
  "Interested, Excited" = INTEG(Watching-Recovering);
  Old News = INTEG(Recovering);
  Recovering = "Interested, Excited" / Avg Duration of Excitement;
  Total Population = "Interested, Excited" + Old News + Have not seen
  Watching = "Interested, Excited" * Contact Rate * (Have not seen/Total Population) 
             * Share Rate * Watch Fraction;

We'll plug in some approximate values for now to get a sense for the general qualitative behavior of a viral internet video could look like. Assume a population of 10,000 individuals with susceptibility traits such as: broadband internet, too much free time, and a social network that includes several college students. 1 person starts in the "interested, excited" category - probably the guy who made the video. Assume that one quarter of the viewers share it (Share Rate = .25) with 10 people in the first day (Contact Rate = 10 people/person/day) and that of those who are exposed, half watch the video (Watch Fraction = .5). Assume that after the two days, people loose interest and "recover" to where the video is old news. With these inputs we see the following response:
The growth follows the logistic curve: in the beginning, loop R1 dominates, leading to exponential growth in the number of people exposed to and excited about the video. As the fraction of people who have not yet been exposed decreases, the rate of "recovery" begins to exceed the rate of "infection" and the balancing loops B1 and B2 dominate the behavior:
As recovery outpaces infection, the number of "Interested, Excited" sharing people decreases until the epidemic plays out its course, with a final infection rate of 92%.

Lets see what happens if we change the rates at which people recover from 2 days to 1 day. In this case we still see 'S' shaped growth, although the final level of "infection" tops out much lower than the original case, and takes a longer time to do so. In this case, the epidemic plays out as balancing loop B2 dominates, and the final infection rate of the population is 38.4%.
Now lets examine what happens if the contact rate decreases to 3 people/person/day. In this case, the rate of "infection" never exceeds the rate of "recovery" and so the meme fails to spread:

Lets test this out on an actual youtube video. I pick this one from January 2012:
This video is short, not about any external events, and likely has little news coverage. Looking at view data:
This shows similar behavior to our model in the case that the meme begins to spread, but fails to infect the full population. The 'S' shaped behavior of the growth is hard to see because our data fidelity is poor over the initial section of the incident. Looking at a  more recent video:
The dynamics of this video are still playing out, and being more recent we have a bit more resolution on the data:
Now lets look at some videos which don't seem to follow the pattern so well. Here's one:


This video has a bit of a fat tail, suggesting that the SIR model is missing something which either introduces new viewers to the video at a constant rate, or that individuals return to the video down the road:
Here's another video with a fat tail:

Finally, a video which defies the model altogether:

Traffic to this video is driven to something entirely different than viral spread:
Not entirely sure how to interpret this one. Cats...

In the next post on this topic, we'll try to discover why some of the videos fit the SIR model better than others. To do that we'll modify the model to better represent the sharing dynamics, and then model repeat views.


© 2016 James P. Houghton