We want to make a graph with a point for every .edu . The size of each point will be proportional to the number of academic papers on subject TOPIC.
Here's what you do.
- Get a list of all .edu's and their long/lat coordinates. The easiest way to do this is to use something like the ip2location.com database or maxmind.com. You'll find there are about 2,000 different .edu's.
- We now have a list of all .edu's and their location on a map, now to determine the size of the points. Use the Google API to run a google scholar query of for every .edu:
http://scholar.google.com/scholar?q=site%3ASOMEEDU.edu+TOPIC
If you were interested in the generic distribution of academic PDFs without regard to a topic, just leave the TOPIC field blank, for example:
http://scholar.google.com/scholar?q=site%3ASOMEEDU.edu
- Count the number of hits you get, this will determine the size of the data point.
- Plot using Google Maps. The geo coordinates determine the location of each point, the number of hits on Google Scholar determines the size/color of the point. The Google Maps API supports doing both out of the box.
- Done.