13 September 2010: Since the scheduled database move work keeps getting pushed back in favor of other projects, perhaps a brief summary of the project is in order. This is that summary.

In the summer of 2009, Clay Shirky was bouncing around the idea of treating Wikipedia as a social network of knowledge production in which social ties could be determined by examining patterns of co-editorship across multiple articles. Fortunately, Wikipedia not only is incredibly public, but provides its data in dozens of useful formats so that finding precisely what you need is easy.

Using the stub-meta-history dump for en-wiki from May of 2009 (a 70+ GB XML file containing a list of meta-data for all edits' editor, timestamp, and article edited) we generated a bipartite analysis of editors-to-articles. Once this was done it became possible to analyze clusters of articles based on links by a single editor, and also to determine how strongly each article was linked to each editor by counting the number of edits they made. While not an infallible system, it provided a good approximation for user commitment to the article in question. By overlaying these clusters of articles it was possible to discern patterns of editorship (people who edit Pluto are likely to edit Neptune), but it also let us identify clusters of users who were likely to have a history of working together across multiple articles, thus forming a sort of unofficial collective group.

That's about as far as we could get with the computing power (and programmer time) we had for the project. Initial findings are interesting, but not conclusive. All of our data will hopefully soon be back online for anyone who wants to play around with it themselves.


9 July 2010: We're working on migrating our data to new servers. We'll be back (hopefully) shortly!