James P Houghton

James Houghton - Working with GDELT


Working with GDELT

08 Apr 2014

The Global Database of Events, Language, and Tone (GDELT) project is a catalog of political events drawn from news sources going back to 1979. Needless to say, the volume of data is extensive, and can be a bit of a challenge to wrangle.

Here's an iPython notebook that works to make things a little simpler by taking a subset of the full GDELT dataset and incorporating it into a Pandas DataFrame. It's a pretty simple algorithm that works one file at a time to minimize the amount of storage space required on the local machine. Unfortunately it isn't terribly fast.


© 2016 James P. Houghton