What are some notable things about the data?
* With the smartest genre overall, those indie kids are, on average, pretty clever!
* For generic types of music people list as their "favorite music" they are ranked:
Soca < Gospel < Jazz < Hip Hop < Pop < Oldies < Raggae < Alternative < Classical < R&B < Rap < Rock < Country < Classic Rock < Techno
* Counting Crows is an usual one near the top of the list. However, when UPenn Princeton, Duke, Williams, and Upenn make up ~20% of the weight you'd expect a high score.
* Beethoven wins by a LAND SLIDE of about 120 points. Further, the difference between "Beethoven" and "Classical" is about 400 points.
* Tool is a little above average. Pink Floyd enjoys massive popularity across the board, but in the end comes out to only slightly above average.
* Most of the musical artists are popular across wide SAT ranges. However, Jars of Clay is not. Their fans cluster quite regularly around the mean. Oddly enough the same goes for Casting Crowns.
How is the weight of a "favorite music" calculated?
Every school moves the average SAT of a "favorite music" in its top 10 closer to its own SAT score. How much does a school move the average? It depends on: 1) the # of undergrads at that school (big schools count more than small schools) and 2) where in the top-10 list the book appeared (being #1 at a school is worth more than being #10). If you want to know the full formula, it is:
schoolweight_i = #ugrads * (11-bookrank)/10
totalweight = sum{ schoolweights }
This is a regular linear falloff and is the most reasonable one I could think of, however, I tried 4 other functions including exponential and logarithmic falloff and only the books in the middle changed much at all. The same 20 "favorite music" were always the smartest, and the same 20 "favorite music" were always the dumbest.
How is the Adjusted Average SAT calculated? // What is m ?
The Adjusted Average SAT is a True Bayesian Estimate -- it's the same way IMDB uses to calculate their Top 250 movies.In short, the true Bayesian estimate is the weighted average with an additional term 'm'. Increasing m takes books with a small number of samples (weight) and moves them towards the mean. The justification behind this is that if a book doesn't have very many samples we can't trust its mean as much as books with many samples. However, there's a problem with this -- what value should m be? You never know. IMDB arbitrarily sets m=1300. With Musicthatmakesyoudumb you can set m to be whatever you want and see the new rankings. However, since we are only looking at music favorites with a high sample size (>=10 samples and totalweight >=10,000), the raw weighted average is well representative and leaving m=0 is probably the right thing to do. Not that it matters much though, if you look at the musicdetails page you'll see that the rankings change very little with high m.
Correlation is not causation blah blah.
That's true. However in this case correlation is enough -- the results are provocative regardless of whether A causes B or B causes A, or even an unknown C causes A and B.
* With the smartest genre overall, those indie kids are, on average, pretty clever!
* For generic types of music people list as their "favorite music" they are ranked:
Soca < Gospel < Jazz < Hip Hop < Pop < Oldies < Raggae < Alternative < Classical < R&B < Rap < Rock < Country < Classic Rock < Techno
* Counting Crows is an usual one near the top of the list. However, when UPenn Princeton, Duke, Williams, and Upenn make up ~20% of the weight you'd expect a high score.
* Beethoven wins by a LAND SLIDE of about 120 points. Further, the difference between "Beethoven" and "Classical" is about 400 points.
* Tool is a little above average. Pink Floyd enjoys massive popularity across the board, but in the end comes out to only slightly above average.
* Most of the musical artists are popular across wide SAT ranges. However, Jars of Clay is not. Their fans cluster quite regularly around the mean. Oddly enough the same goes for Casting Crowns.
How is the weight of a "favorite music" calculated?
Every school moves the average SAT of a "favorite music" in its top 10 closer to its own SAT score. How much does a school move the average? It depends on: 1) the # of undergrads at that school (big schools count more than small schools) and 2) where in the top-10 list the book appeared (being #1 at a school is worth more than being #10). If you want to know the full formula, it is:
schoolweight_i = #ugrads * (11-bookrank)/10
totalweight = sum{ schoolweights }
This is a regular linear falloff and is the most reasonable one I could think of, however, I tried 4 other functions including exponential and logarithmic falloff and only the books in the middle changed much at all. The same 20 "favorite music" were always the smartest, and the same 20 "favorite music" were always the dumbest.
How is the Adjusted Average SAT calculated? // What is m ?
The Adjusted Average SAT is a True Bayesian Estimate -- it's the same way IMDB uses to calculate their Top 250 movies.In short, the true Bayesian estimate is the weighted average with an additional term 'm'. Increasing m takes books with a small number of samples (weight) and moves them towards the mean. The justification behind this is that if a book doesn't have very many samples we can't trust its mean as much as books with many samples. However, there's a problem with this -- what value should m be? You never know. IMDB arbitrarily sets m=1300. With Musicthatmakesyoudumb you can set m to be whatever you want and see the new rankings. However, since we are only looking at music favorites with a high sample size (>=10 samples and totalweight >=10,000), the raw weighted average is well representative and leaving m=0 is probably the right thing to do. Not that it matters much though, if you look at the musicdetails page you'll see that the rankings change very little with high m.
Correlation is not causation blah blah.
That's true. However in this case correlation is enough -- the results are provocative regardless of whether A causes B or B causes A, or even an unknown C causes A and B.
Comment