Booksthatmakeyoudumb FAQ

How should I link to your homepage to maximize your Google linkage-love?

Thanks for asking! Link to me with: Virgil

I really, really, really hate your genre categorizations.

Don't blame me---I didn't make them. To defend myself from precisely this criticism, the genres were automatically generated from the most-frequent tags for each book on So go blame LibraryThing. Furthermore, if you really hate the genres, you can safely ignore them. They are only used for the color-coding on the graphs and rankings within a genre, they don't affect the raw data.

What are some notable things about the data?

* Harry Potter is the most popular book. The Bible is the second most popular book. At least among college students, Harry Potter is, like the Beatles, indeed bigger than Jesus. Harry Potter still wins even if you add "The Bible" and "The Holy Bible" together.
* Although I had no idea at the beginning of this project, I was ever so pleased to discover that Caltech is the smartest school in the country (on average).
* The smartest religious book is "The Book of Mormon". The dumbest religious book is "The Holy Bible". I'm sure this pleases the Mormons immensely.
* The dumbest philosophy book is "The Five People You Meet In Heaven" and the smartest philosophy book is "Atlas Shrugged".
* "Lolita" is the smartest book.
* The top/bottom 20 books are remarkably stable. I tried 5 different weighting algorithms and their only variation was in the middle. The dumbest books were always at the bottom, and the smartest books were always on top. This is even further corroborated by the fact that the extremes change remarkably little with increasing m.
This is slightly specious, but if you wanted to you could consider "I Don't Read" as a control variable. Thus, if "I Don't Read" is smarter than 13 books, then you'd think these bottom thirteen books could in fact, make you dumber than not reading at all.

Why isn't school _______ in your database?

I tried to include everyone. Really, I did. But if your school isn't listed, it's either because A) I couldn't find your school's network on Facebook, or B) Collegeboard didn't have reliable SAT/ACT scores for your school.

Shouldn't "The Bible" and "The Holy Bible" be grouped into the same book?

No, they shouldn't.
Just glancing at the distributions, they are substantially different. If you do a Student's t-test (a standard statistical test for determining whether two distributions come from the same population) between "The Bible" and "The Holy Bible". The Bible and The Holy Bible, respectively, have means=1046.65, 909.22; variances=11835.86, 3827.74; samples=800, 24. After some computations, you get a value of T at ~6.10 with ~822 degrees of freedom. Putting the probability of them both being the same population around 1.37*10^-9. You get a similar result if you a normal Z-test instead of a t-test. This really blows any measure of confidence out of the water. Ergo, these two books are WAAAAAAAY different distributions. End of story. How is this possible? Well, look at the bookdetails page for each book -- they are completely different. Not a single school above the mean (1071) has "The Holy Bible" on their list.

It's just how facebook groups them. For example, Sam Houston University has "The Bible" at #1, and "The Holy Bible" at #3. This is true of most schools that have "The Holy Bible" on their list. Second-guessing all of Facebook's groupings is an exercise in meticulous data-grooming.

Lolita is not erotica, instead it should be in genre "Classics".

Yes. I know. It was a bloody joke to list Lolita as erotica. Sheesh. I got so much annoying e-mail about this that I switched it to Classics. Happy now? Pish.

Could you do this for international schools?

This could be done internationally, but I don't know of any central repositories for college entrance exam scores other than, which lists only the ACT and SAT for colleges in the United States. If Europe/Asia has a website listing the typical entrance exam scores for their schools, you could do the same analysis for those as well.

What is the SAT, ACT?

The SAT and ACT are respectively the most and second most common college entrance exam in the United States. The ACT is scored from 1-36. The SAT was until very recently scored from 400-1600. A few years ago this was changed so that the SAT is now scored from 600-2400, however, most schools on don't yet list statistics on the new 600-2400 scale. So I used the older 400-1600 scale.

Is the facebook data from each person's profile. Isn't that a selection bias towards people that have their profiles visible to everyone?

No, it is not. I collect the data from Facebook's "Network Stats" page for each college. This makes a selection bias towards people that list their favorite books, but there is no selection bias towards people that have their profile publicly viewable. If you list it on Facebook, Facebook counts it. And if your books make the top 10, it then I can (and do) count it.

Do people with SAT >1400 just not listen to music?

They do listen to music -- just look at those high-tier schools at The Schools. Averages are being plotted. So even though high-scoring schools are included in this average, the average is pulled down by the less selective schools.

Do you have a press photo?
No I don't. But I do like this one.

How is the weight of a book calculated?

Every school moves the average SAT of a book in its top 10 closer to its own SAT score. How much does a school move the average? It depends on: 1) the number of undergrads at that school (big schools count more than small schools) and 2) where in the top-10 list the book appeared (being #1 at a school is worth more than being #10). If you want to know the full formula, it is:

schoolweight_i = #ugrads * (11-bookrank)/10
totalweight = sum{ schoolweights }

This is a regular linear falloff and is the most reasonable one I could think of, however, I tried four other functions including exponential and logarithmic falloff and only the books in the middle changed much at all. The same 20 books were always the smartest, and the same 20 books were always the dumbest.

How is the Adjusted Average SAT calculated? What is m ?

The Adjusted Average SAT is a True Bayesian Estimate -- it's the same way IMDB uses to calculate their Top 250 movies. In short, the true Bayesian estimate is the weighted average with an additional term 'm'. Increasing m takes books with a small number of samples (weight) and moves them towards the mean. The justification behind this is that if a book doesn't have very many samples we can't trust its normal mean. However, there's a problem with this -- what value should m be? You never know. IMDB arbitrarily sets m=1300. With Booksthatmakeyoudumb you can set m to be whatever you want and see the new rankings. However, since we are only looking at books with a high sample size (100 most popular books), the raw weighted average is well representative and leaving m=0 is probably the right thing to do. Not that it matters much though, if you look at the bookdetails page you'll see that the rankings change very little with high m.
Doesn't this violate Facebook's Terms of Service?
No. It doesn't. Facebook prohibits the "use automated scripts to collect information from or otherwise interact with the Service or the Site". A friend of mine manually collected our books data.

Correlation is not causation blah blah.

That's true. However in this case correlation is enough -- the results are provocative regardless of whether A causes B or B causes A, or even an unknown C causes A and B.

The data you scrape from Collegeboard doesn't give you the mean SAT/ACT.

That's true. Technically it's the average of the 1st and 3rd quartile. Unfortunately, there was no data available on the raw means, and the average of the 1st and 3rd quartile isn't going to be the mean or median unless the distribution is symmetric (which may be true for most schools, but probably won't be true for high-scoring schools). However, even in the worst cases the average of the 1st and 3rd quartile is still decently representative of the center of the distribution. And even if it wasn't, there's not much worry about -- the top/bottom of the rankings (which are pretty much all people care about anyway) are exceedingly robust (see above).