Google Scripts

I love using Google Docs for most of my 'Office Computing' needs, because it means that I don't have to be tied to any specific computer. I have all of my files available anywhere I get internet, and I can sync work files to a work computer and personal files to a home computer. What I haven't been able to do has been any real computing or data manipulation. Google now offers 'scripts' as part of the Docs suite, which meets part of this need.

Coding is a little clunky as its all in JavaScript, but it offers some really nice components for interacting with web-based data streams:

URL Fetch
One of the most useful components of Scripts is the ability to fetch websites and parse the results, which means you can use pretty much any (open) REST API, and pull the info into a JSON format. Here's an example of connecting to the Twitter API

var URL = "https://api.twitter.com/1.1/statuses/user_timeline.json?"+
"screen_name="+task.user+"&count=200"+"&max_id="+task.max_id;
var response = UrlFetchApp.fetch(URL,options).getContentText();
var object = Utilities.jsonParse(response);

Oauth
You can authenticate to websites that use the Oauth 1 protocol (like Twitter) but you can't use the Oauth 2 protocol. It would be lovely if you could, because all of Google's other API services (most relevantly Fusion Tables) use primarily Oauth 2. Here's an example, again for connecting to the twitter API.

  // Set up an Oauth configuration:
var oAuthConfig = UrlFetchApp.addOAuthService("twitter");

oAuthConfig.setAccessTokenUrl("http://api.twitter.com/oauth/access_token");
oAuthConfig.setRequestTokenUrl("http://api.twitter.com/oauth/request_token");
oAuthConfig.setAuthorizationUrl("http://api.twitter.com/oauth/authorize");
oAuthConfig.setConsumerKey(<Consumer Key>);
oAuthConfig.setConsumerSecret(<Consumer Secret>);
var options = {"oAuthServiceName" : "twitter",
"oAuthUseToken" : "always",
"contentType" : "application/json" };

Useful links re: Oauth:
Authenticating to Google Services with Oauth1.0a

Spreadsheets
Scripts was originally built up as a way to extend Google spreadsheets functionality, and two play well together. Scripts uses the ID of a spreadsheet, which is essentially everything after the "id=" tag in the SS URL.

// Get the spreadsheet:
var ss = SpreadsheetApp.openById(<ssID>);

// Get an existing sheet from the ss:
var sheet = ss.getSheetByName("AuthorParetoSDR");

// Create a new sheet:
var sheet = ss.insertSheet("AuthorParetoSDR");

// Write a block of data to the sheet:
var range = sheet.getRange(1, 1, data.length, data[0].length);
range.setValues(data);

ScriptDB
One nice feature is that script projects come with a built in database for storing script information. It's simple to load the database, save or remove elements and query them later.

// Get the Script Database:
var db = ScriptDb.getMyDb();

// Save an individual element to the database:
var stored = db.save(initialTask);

// Save a batch of elements to the database:
var mutations = db.saveBatch(authors, false)
Logger.log("publications ok: " + db.allOk(mutations));

// Query The Database:
var response = db.query({isTask: true});
  if(response.hasNext())
{
element = response.next();
}

// Remove an element
db.remove(element);

What is also nice is that if you save an element that you have just queried for, the save modifies the existing element, instead of creating a duplicate.
The database has a 50 Mb size limit (for normal users) which to me is a bit stingy, and object sizes themselves are limited to about 4kb, which forces you to be very deliberate about how you store data. To my knowledge, there is also no way of knowing how close you are to those limits, and so it's difficult to plan accordingly. There are limits on the number of database operations you can perform in a day, so you need to be deliberate about batching.
A few features that the database could have, but lacks: methods to clear the DB, a tool to easily view the contents of the DB, and methods to help you better plan for limits.

If running a query using the .anyOf() parameter, there is a limit (somewhere between 3 and 40?) to the number of elements that you can pass in the any-of array. Documented here.

Maximum Execution Time
The most annoying thing about Google scripts is that they have a maximum execution time of 6 minutes. What this implies is that if you are performing any serious data gathering or analysis, you have to break the task down into bite-sized chunks, and keep track of where you are in your execution, and then run the function multiple times. The way I get around this is to use a setup function to create tasks that I want the script to accomplish, each of which are relatively small:

// List user IDs for the script to look up:
var seedids = ["66967746", "180779476"];
for(var i in seedids)
{
var task = {taskType : "getFriends",
userID : seedids[i],
cursor : -1};
db.save(task);
}

And then, I have a second function that gets a task (in no particular order) and executes it:

// Get the task:
var response = db.query({taskType : "getFriends"} );
var task = response.next();

// perform the task operation...
// save the result ..

// Remove the task
db.remove(task);

Script Triggers
Script triggers are pretty much the best way to get around the Maximum execution time limit. Once you have set up the script to address tasks independently, you can automatically execute a task every minute. If your tasks approach the 6-minute limit, you could theoretically have 6 going at once - a significant improvement in execution time over linear analysis. First set up the tasks using a setup function, as we discussed above:

function setupGetUsersFriends()
{
// create tasks and save to database
}

Then use the Resources->Current Scripts Triggers dialog to trigger your execution function once per minute, with the following provision for dealing with the task list:

function getUsersFriends()
{
// Get a list of outstanding tasks that this function can execute
  var response = db.query({taskType : "getFriends"} );
if(response.hasNext()) // if there are tasks:
{
// Get the task:
var task = response.next();

// execute the task and save data...

// Remove the task
db.remove(task);
}
else //job is done
{
  // Delete all current script triggers
var triggers = ScriptApp.getScriptTriggers();
for (var i = 0; i < triggers.length; i++)
{
ScriptApp.deleteTrigger(triggers[i]);
}
}
}

At some point, I'll learn to set up the triggers programmatically in the setup function.

SOAP
There is a problem in the Google soap implementation which adds an extra set of <parameters> and <Search> tags where they don't belong. The issue is tracked here. Unfortunately, this seems to make google's built in soap functions pretty much useless.

Debugger
The debugger performs your basic functionality - lets you step through the code, and shows you the state of all the variables. Be careful though, if you try to inspect an array with too many elements (not sure what counts as too many - it's above 50) then the whole page freezes.

Some error messages are obscure:
"Unexpected exception upon serializing continuation" is almost tauntingly useless. Apparently it means many things, and can intermittently be thrown by the debugger for no real reason. Fun.

"Invalid assignment left-hand side. (line 1)" seems to mean that there is an error in the code, but it probably isn't about assignment operators, and is most certainly not on line 1.

Conclusion
Someday I'd like to learn how to do these sorts of operations using Amazon's Elastic Compute Cloud and Elastic Map Reduce, but for the time being, this'll do.

James Houghton - Google Scripts

Google Scripts