Working with GDELT

The Global Database of Events, Language, and Tone (GDELT) project is a catalog of political events drawn from news sources going back to 1979. Needless to say, the volume of data is extensive, and can be a bit of a challenge to wrangle.

Here's an iPython notebook that works to make things a little simpler by taking a subset of the full GDELT dataset and incorporating it into a Pandas DataFrame. It's a pretty simple algorithm that works one file at a time to minimize the amount of storage space required on the local machine. Unfortunately it isn't terribly fast.

James Houghton - Working with GDELT

Working with GDELT