The inexorable data-deluge is here. Learn to swim. I have a few projects in this area.
Deriving Mother's Maiden Names (2006)
This was my first academic paper which I wrote during my year as pseudo-faculty at Indiana University School of Informatics between my undergrad years. It's a fun paper that was cited around the media and was included as part of the RSA Cryptobytes Journal. Years later when I formally studied information theory I discovered that I had made a subtle mistake in this paper's analysis (In short, information theory isn't the right thing to use here.) Fortunately the mistake didn't affect the conclusions. This paper required scrubbing a lot of ugly data. I uploaded my scrubbed data of marriage records, divorce records, birth records, and death records to archive.org.
This was shortly after I released the WikiScanner, and everyone was asking me how to come up with ideas like that. I wanted to illustrate how fun things can be made simply by combining two or more "boring old datasets", and "Booksthatmakeyoudumb" was born. Back in the day (pre-2010), Facebook was split into "networks" based on email address. For example, if you registered with a @caltech.edu address, you would automatically be placed into the "Caltech" network. And although they weren't well known, Facebook offered "Network stats" pages that listed the ten most popular Favorites for every network. For example, it would list the 10 books that appeared most in people's "Favorite Books" list. So I downloaded the top-10 list for each college around the United States and merged with them the average SAT scores from CollegeBoard. The result was hilarity incarnate. I called it "Booksthatmakeyoudumb" (all one word) largely as a troll. I learned that the Internet does not understand the juxtaposition between scientific-graphs and trolling. Favorite line, "Harry Potter is the most popular book. The Bible is the second most popular book. Among college students, Harry Potter is, like the Beatles, bigger than Jesus."
This project earned me my first ever ambush interview. A BBC talk-show host invited me on without really saying what he wanted to talk about and once I was on the air he accussed me of being a racist and a terrible person. I basically deflected the attacks by if he sees racism here then his beef is with the SAT, not me. He seemed annoyed that I didn't bite back and I was off the air after only a few minutes.
After I released the Booksthatmakeyoudumb, people asked for a Music one. I didn't think Music would be as interesting as books, but since the code was 95% the same as the Books, I acquiesced to popular demand. To my surprise, the music one was WAAAAAAY more popular than the books. My favorite line, "I don't want to be smart if I have to listen to Counting Crows."
Freefood at Caltech/MIT (2008)
During my undergraduate days at Indiana University I had a friend that would find a campus event with free food almost every way. I was so inspired by this I made a tool so others could do the same. I subscribed to every Caltech public calendar I could find and then scanned the events for keywords relating to free food. All matching events were then displayed on a Google Calendar that anyone could subscribe to. I later did the same for MIT.
This was fun to do and it worked for a while, but after the API for querying Caltech and MIT calendars changed and the service stopped working and I didn't have time to resurrect it. I'd like to get this working again. Send me an email if you're interested in captaining their rebirth.