Spurious relationships are a serious problem in social scientific research. But… they’re also fun! For example, did you know that ice cream causes crime? Also, global temperature is inversely related to the number of pirates.
Unfortunately, pundits and journalists love them some spurious relationships. For example, an old post of mine examined the claim that the 2012 presidential election would be decided by “Cracker Barrel voters” vs. “Whole Foods voters.” I showed that the predictive power of “Chick-fil-a” and “Ben and Jerry’s” is actually better (you know, two food brands that are actually political). Again, it’s all silly from a social science standpoint, but consider how prevalent the “NASCAR Dads” and “Soccer Moms” labels are explanations for routine political behaviors.
Anyway, Seth Masket provides an excellent example of a spurious relationship asking: Are Democrats perverts? (note: see also Dylan Matthews at Vox and Chris Ingraham at the Washington Post). As he shows, there is a strong relationship between porn use and Obama’s vote share in 2012! But as Seth aptly explains:
Of course, this probably isn’t explaining anything real. For one thing, there’s a big potential ecological inference problem here. We’re making assumptions about individual level behavior by examining data aggregated at the state level… Second, chances are that even if there is an individual relationship here, it’s not a direct one. Porn usage may correlate with something else that also correlates with partisan voting patterns. This could be poverty, internet speed and availability, age, marriage rates, etc.
Seth notes that porn pageviews explain 16% of the variation in Obama’s vote share. In an attempt to outdo Seth in the spurious-relationship-a-thon, I examined the predictive power of Google search traffic at the state-level for: (1) the rock band “Nickleback” and (2) “cure for herpes.” Here are the results.*
According to the first scatterplot, “Cure for herpes” has a negative relationship with Obama’s vote share in 2012, supposedly indicating that herpes caused people to vote for Mitt Romney in 2012. However, the R2 is just 0.08, which indicates that herpes has little predictive power (in fact, the relationship is not significant (p=.12; n=31)). Nickelback also has a negative effect, supposedly indicating that fans of the Canadian rock band were more likely (!) to vote for Romney in 2012. Notably, in the second scatterplot, the R2 is 25%, indicating the fully a quarter of the variation in the 2012 election outcome is explained by the state-level variation in Nickelback fans.
While it’s difficult to take the above “findings” seriously, social scientists dedicate their lives to distinguishing between correlation and causation. In practice, it’s much harder than most people appreciate. What the world gives us is correlation, so it’s easy to draw faulty conclusions from observational data. In an undergraduate research methods class of mine, we cover countless examples of correlation and causation. For example, congressional candidates who spend large amounts of their personal wealth on their campaign actually receive fewer votes. Also, humanitarian aid is associated with various negative consequences such as higher rates of infant mortality. Are these causal relationships? Of course not. Always remember: correlation is not causation.
* Don’t tell my department chair I spent 30 minutes working on this…