James P Houghton

James Houghton - Data Sources

Data Sources

10 Oct 2012

This post will be a collection of resources for different types of publicly available data.

Google Trends: http://www.google.com/trends/
Google will give you a chart listing the volume of searches for a particular phrase over a period of time. Unfortunately there is no official API, but you can download data as a CSV file.

Google Ngramshttp://books.google.com/ngrams
Ngrams shows how often a specific word appears in the corpus of published literature. Their data set is extensive and goes back hundreds of years. There is no super-easy way to get data out, but you can download the raw data sets.

InfoChimps: http://www.infochimps.com/marketplace
InfoChimps has a collection of public and proprietary data sets.

Youtube: http://www.youtube.com/
Youtube lists view statistics for a video on the video's page. I've written about how to extract that data here.

Facebook: https://graph.facebook.com/barackobama?
Facebook has several ways to access its data. A very simple API gives statistics for a page in JSON format.

R Dataset Collection: http://vincentarelbundock.github.com/Rdatasets/datasets.html
This is a handy collection of eclectic Datasets.

Vehicle Fuel Efficiency Data: http://www.fueleconomy.gov/feg/download.shtml
You can download a .csv with pretty much every car the US has rated for fuel efficiency since 1984, along with some engine specs.

GDELT: http://eventdata.psu.edu/data.dir/GDELT.html
Global data on events, locations, and times - essentially an enormous news-media capture in machine readable format

© 2016 James P. Houghton