James P Houghton

James Houghton - Community Identification using the Twitter API


Community Identification using the Twitter API

18 Oct 2012

I've been using the Twitter API from Google Scripts to explore the graph of Twitter communities. I want to know if, starting from a few (2-5) members of a community, I can programmatically discover, with some bound of confidence, who other members of that community are. After I have a fully developed social graph, is there a way for me to easily identify the 'clumps'?

Let's look at the first problem: discovery of potential community members. If I start with a set of seed members and discover that all of them have relationships with a particular individual, there is a relatively high chance that she is also part of the community. However, if only one of the seeds has a relationship with the individual, it is less likely (although not impossible depending on the nature of the network) that they are members of the community.
We'll start by looking at the Twitter 'friends' of our seeds. As Twitter relationships are unidirectional, these are individuals that the seeds consider to be their peers, or whose messages they value. Below is a function that gets all the friends of a particular screen_name passed in as 'user', which we'll run for all the seeds. Options lists the Oauth components, pretty much as I explained in this post. The function returns a list of tuples representing a relationship: [user, friend].


function getUserFriends(user, options)

  // Get the user's friends id's
  var URL = "https://api.twitter.com/1.1/friends/ids.json?"+
            "screen_name="+user+"&stringify_ids=true"; 
  var response = UrlFetchApp.fetch(URL,options).getContentText();
  var idobject = Utilities.jsonParse(response);
  var ids = idobject.ids;
  
  // Get the detailed data about the user's friends
  var data = [];  
  for(var j = 0; j<ids.length; j+=90)
  {
    // construct the url
    var URL = "https://api.twitter.com/1.1/users/lookup.json?"+"user_id=";
    for(var i = j; i<j+90; i++)
    { URL += ","+ids[i]; }
    
    // query the API
    var response = UrlFetchApp.fetch(URL,options).getContentText();
    var object = Utilities.jsonParse(response);
    
    // parse and store the response
    for(var i in object)
    {data.push([user, object[i].screen_name])
  }
  return data;
}  

The diagram on the the left shows the network of relationships between the seeds and their friends. In the right hand diagram, we can assign friend a rank according to the number of incoming connections, and we start to pay less attention to the seeds. The ranked friends are candidates for being the next round's seeds.
 // use the new connections we've found to update the candidate lists
  for(var i in relationshipList)
  {
    var currentCandidate = relationshipList[i][1];
    // see if friend is in the done list
    var doneListIndex = BinarySearch2D(doneList, 0, currentCandidate);
    //if so, increment its rank
    if(doneListIndex < doneList.length) {doneList[doneListIndex][1]++; } 
    else
    {
      // see if friend is in the candidate list
      var candidateListIndex = BinarySearch2D(candidateList, 0, currentCandidate);
      // if so increment there
      if(candidateListIndex < candidateList.length){candidateList[candidateListIndex][1]++;} 
      else
      {
        //otherwise, add it to the candidate list
        candidateList.push([currentCandidate, 1]);
        candidateList.sort(function(a,b){
                              if(a[1] < b[1]){return 1;} 
                              else if(a[1] < b[1]){return -1;}
                              else{return 0}}); 
      }
    }
  }

In the next round, we replace the seeds with their friends of highest rank. We poll the new seeds for their friends, and on the left see links branching both to 'new' friends, and to the remaining candidates from the last round. Incoming arrows increase the ranking of the expanding batch of candidates.
  // sort candidate list by rank
  candidateList.sort(function(a,b){return b[1]-a[1]})
  
  // get rank of highest candidate
  var maxrank = candidateList[0][1];
  
  // get the friends of all the highest ranking candidates
  var relationshipList = [];
  while(candidateList.length && candidateList[0][1] == maxrank) 
  {
    var currentCandidate = candidateList.shift();
    var currentFriends = getUserFriends(currentCandidate[0], options);
    relationshipList = relationshipList.concat(currentFriends);
    doneList.push(currentCandidate);
  }
After the second round, we promote a new set of candidates to 'seeds' and run the process again. We continue in this manner, expanding outward from the initial seed, focusing on the individuals most connected to the community, until we reach either a desired number of connections, candidates, or cumulative seeds. 

Here's a diagram of what a network structure looks like after a few iterations. You can clearly see which elements have become seeds and which haven't. I'll have to see if I can scale this a bit so that the effect is less pronounced over the community. 



Some interesting other Twitter API projects are:
Twitter App for Gmail - uses google scripts, so it's a good example for our environment
Creating Twitter Lists from Hashtag Users with Apps Script



© 2016 James P. Houghton