Virgil.GRiffith:: WikiScanner FAQ [Index]  

WikiScanner FAQ

Answers to common questions about WikiScanner

How should I link to your homepage?

Link to me with <a href="http://virgil.gr">Virgil</a>. Thanks.

I represent the media. Will you talk to me?

Yes, I will talk to you. But as I'm on a quest to become the #1 hit on google for query 'virgil', I ask that on your website you put a link to virgil.gr with the anchor 'virgil' as described above.

Do you have press photos?
No, I don't. However, I do have some photos that I like. Choose whichever.
Credit to the California Tech (Caltech's student newspaper).
Credit to Jake Appelbaum.
Credit to Jake Appelbaum. Above photo with pastry.
Credit to Meng Weng Wong.
Credit to Jake Appelbaum.
Credit to Henry Strickland.

What is some notable fallout/entertainment from WikiScanner?
Off the top of my head, these come to mind:
A lawsuit was filed ; Japanese Imperial Household can't edit Wikipedia ; Australian Ministry of Defence blocks editing Wikipedia ; A New York Times frontpage story discusses ejaculating on George Bush's page and Condoleezza Rice being a concert penis ; A royal family family looked silly ; WikiScanner was repurposed for social science research ; Diebold engages in ethically notable conduct. twice. three times. ; I'm officially a "disruptive technologist". The Wash Post says so ; The Minnesota Republican National Committee spoil Harry Potter ; ACLU informs us that pederasty is one of the Pope's many important functions ; Wikileaks using WikiScanner to expose German Intelligence ; I know there is more, if you got others please send it to me!

Why did you create WikiScanner?

To create a fireworks display of public relations disasters in which everyone brings their own fireworks, and enjoys.

To improve virgil.gr 's Google pagerank for the query ' virgil ' Update: I am fluctuating between #1 and #2.

To see what "interesting organizations" are up to.

Every time I hear about a new security vulnerability, I look to see if it can be done on a massive scale and indexed.

Why are there no edits from 2008? Have people stopped editing Wikipedia out of fear?

No, they haven't stopped editing Wikipedia. I simply haven't updated the database since August 2007. WikiScanner will receive an overhaul and database update in Summer 2008.

Don't I know you from somewhere?

Well, I did get sued a few years ago. By happenstance, Wikipedia even has an article about it.

Do we *really* know these edits came from <insert company here>'s executives or their lackies?

Technically, we don't know if it came from an agent of that company. However, we do know that edit came from someone with access to their network. If the edit occurred during working hours, then we can reasonably assume that the person is either an employee of that company or a guest that was allowed access to their network.

Do you have examples of blatant misinformation being injected into Wikipedia?

As someone who has experienced the legal system being turned against them in unexpected ways, for legal reasons I am not providing any commentary or lists of particularly juicy edits. All of the tools are there for you to find them yourself without much trouble. I've provided the tool as well as an "Editor's Picks" list of interesting organizations to get you thinking. Furthermore, Wired already has clearinghouse for juicy edits on their 27bstroke6 blog . There's also a nice list of edits at MaltaStar .

What kind of vandalism and disinformation have you found?
Without naming any names, I've found three common kinds of vandalism.
  1. Wholesale removal of entire paragraphs of critical information. (common for both political figures and corporations)
  2. White-washing -- replacing negative/neutral adjectives with positive adjectives that mean something similar. (common for political figures)
  3. Adding negative information to a competitor's page. (common for corporations)

Was the reaction what you expected?

Yes.

Can you do this for other languages (e.g. de.wikipedia.org)

Yes. My interest in other languages is roughly proportional to their rank on http://meta.wikimedia.org/wiki/List_of_Wikipedias.

How old are you?

24.

Do you think Wikipedia is reliable?

Overall--especially for non-controversial topics--Wikipedia seems to work. For controversial topics, Wikipedia can be made more reliable through techniques like this one. As for other approaches, I think colored text is a promising direction for combating disinformation in wikipedia.

What does Wikipedia have to say about WikiScanner?

As far as I know, the reaction from the Wikimedia Foundation has been wholly positive. The Wikipedians are good people -- they don't mind the light of day. Also see the Wikipedia entry on WikiScanner.

How did you come up with this idea? How long have you been at it?

I came up with the idea when I heard about Congressmen getting caught for white-washing their wikipedia pages. Every time I hear about a new security vulnerability, I think about whether it could be done on a massive scale and indexed. I had the idea back then, I've been busy with scientific work so I sat on it until a few weeks ago when I started working on the WikiScanner.

Won't these edits just make people be more sneaky about how they edit Wikipedia (using 3rd parties, etc.)?

Unlikely. Even though dusting for fingerprints is well-known and has been around for decades, police still dust for prints at crime scenes, and it often works!

Is something like WikiScanner required to keep people honest? Is anonymous editing bad for Wikipedia?

The low barrier to entry for adding new content is vital to Wikipedia's rapid growth. Drastically increasing the effort and commitment required to add new information would be disastrous. Instead of stopping anonymous contributions, Wikipedia should (continue to) use various back-end analyses (such as WikiScanner) that will help counteract disinformation while keeping the low barrier to contribution. Overall--especially for non-controversial topics--Wikipedia already works. For controversial topics, Wikipedia can be made more reliable through techniques like this one.

Can you confirm some numbers related to WikiScanner?

Sure.

The WikiScanner database was made by extracting all anonymous edits from the publicly available Wikipedia database dump (which is released about once a month). There are 2,668,095 different organizations in the database which I am using to connect IP#'s to organization names.

# edits in English WikiScanner database: 34,417,493 from February 7th, 2002 to August 4th, 2007
# distinct organizations who have made edits to English Wikipedia: 187,529

---------
# edits in German WikiScanner database: 7 754 709 from March 15th, 2002 to July 28th, 2007
# distinct organizations who have made edits to German Wikipedia: 51 518

---------
# edits in Japanese WikiScanner database: 5 963 811 from September 1st, 2002 to August 17th, 2007.
# distinct organizations who have made edits to Japanese Wikipedia: 21 201

---------
# edits in French WikiScanner database: 2 964 888 from October 31st, 2002 to August 23, 2007.
# distinct organizations who have made edits to French Wikipedia: 28 878

How long did it talk you to create WikiScanner?

After I started working on it, about 2.5 weeks. This time was split about evenly between creating WikiScanner and playing Zelda: Twilight Princess . Both were lots of fun.

How does WikiScanner work?

When you make an edit to Wikipedia, you have two choices. First, you can register and leave your username, or you can edit anonymously. But, when you edit anonymously, it uses your IP address, a number which identifies what computer network are you from, in lieu of a username. Wikipedia does this for convenience to distinguish your anonymous edits from someone else's anonymous edits. In essence, WikiScanner combines two databases: (1) The list of all IP adresses that have made edits to Wikipedia, and (2) What IP addresses belong to which companies. So with WikiScanner you can type a company name, and it shows you what edits have come from IP addresses owned by that company.

What is this "at santafe dot edu" in your email address?

I am a Visiting Researcher at the greatest place on Earth for doing interdisciplinary scientific research, the Santa Fe Institute . They gave me great personal freedom to think out of the box while creating WikiScanner.

What is your official affiliation with the California Institute of Technology?

As of this September I am a graduate student in the Computation and Neural Systems department at Caltech.

Where do I send a donation?

Aw, thanks. But aside from increasing my google rank, all motives here are pure and 100% non-commercial. Glory be!

Could a nefarious person do IP Spoofing to make it look like edits came from a company they didn't come from?

Hypothetically, yes. But it would be very, very, very hard -- hard enough that it's more likely that Saddam's Weapons of Mass Destruction were airlifted to Iran. Furthermore, the nefarious IP spoofer would have to accomplish this for every spoofed edit -- thus decreasing the probability even further if there are multiple salacious edits.

What about the Tor IP# anonymizing network (http://tor.eff.org)?

Silly goose. Tor has been blocked from editing wikipedia for a while now. Try it.

Did the Wikimedia Foundation ever hire you (for WikiScanner or otherwise)?

No. I've never been hired by the Foundation or done work for Wikimedia. Everything related to WikiScanner is 100% noncommercial -- it's better that way, no?

What's next for you?

Well, since September 2007 I am graduate student studying theoretical neurobiology and artificial life under Christof Koch and Chris Adami, so I'll be spending 90% of my time doing that. However, I do have some other data-minery projects in the works (some wikipedia related, some not). They'll be out later. Probably around December after classes end. Grad school workload is a pain. I know. And although people keep asking, sorry, no word on these projects until they're done. The best works are unleashed onto the world by surprise. :) I think you'll like them though.