In a recent post, Gabriel Rossman comes up with a simple example of why statistics are hard to do correctly with an easy example.

- If good looks and smarts are distributed normally, and
- If good looks and smarts
**have nothing to do with each other**, and - If movie producers want both smarts and looks
- Then, by observing
**employed actors**we’ll assume that looks and smarts have a**negative correlation** - Even though we constructed this experiment with
**no correlation**

Here’s a graph of 250 randomly generated points (with no correlation). With the red circles representing “actors who are smart and good looking enough to get a job (looks+smarts>2), and lighter blue x’s representing “people who wanted to be actors”:

Clearly if we only look at actors with jobs, we’ll see a clearly negative correlation between smarts and good looks. In fact, some brilliant actors are less attractive than an average person, and some gorgeous actors are dumber than an average person. Even more interesting though, is that if we try to rule out bias by looking at aspiring but unsuccessful actors as well, we’ll find that they exhibit a similar correlation. Here are the lines of best fit for both:

This effect is particularly nefarious in that it’s distribution agnostic. For instance, assume for mathematicians:

- Experience and brilliance are uniformly distributed
- With experience, comes somewhat more brilliance (I’ve introduced a
**positive**20% correlation) - Only the top fifth of mathematicians (as measured by experience+brilliance) ever get anywhere, and the rest drop out to do something easier
- It’s very easy to conclude that experience kills brilliance, and that a mathematicians best work will be done by 40 – a
**phantom negative correlation**

In a general sense (the proof being left as an exercise for the reader):

- Given two measurements x
_{i}in X and y_{i}in Y on a set of points p_{1…n}in P, if the value of x_{i}+y_{i}increases the chance that p_{i}will be sampled, it will introduce a phantom correlation between X and -Y

Kind of scary, eh?

—

*Disclaimer: Although the author is ostensibly a mathematician, he has never been a very good one (he did the graphs in Excel, what’s that about?). All theorems should be proven from first principles before attempting to use at home. Vote no on the axiom of choice. source*