Friday 10 March 2017

Ten Things You Didn't Know About Pearson's r (or Knew but are Worth Thinking About Again)



From the Sheffield Evening Telegraph, August 1912


You'd need an N of 38,416 for r = .01 to be statistically significant at p < .05. So get recruiting.

You'd need an N  of 5 for r = .90 to be statistically significant, p <.05. So pick a strong effect.

It's (co)developer, Karl Pearson was born in Islington, North London. He was a polymath (which is a posh word for bloody brilliant at everything), who made important contributions in statistics, philosophy, meteorology and evolutionary theory. It's not all good though. He was described as "probably the most dangerous man in England outside parliament" in the Sheffield Evening Telegraph in August 1912. Not for his work on correlation, I should add, but for his attitude toward eugenics. Anyway, moving swiftly on...

According to Rogers and Nicewander (1988), there are 13 ways to look at it.

There at least five methods of creating confidence intervals on meta-analysed Pearson's rs.

There are a number of ways to test the difference of independent correlations (coefficients from separate samples) and dependent correlations (coefficients from the same sample). The test of dependent correlations implemented in the "psych" R package is the T2 formula appearing in Steiger (1980; Equation 7). Other methods of testing the difference and heterogeneity of dependent correlations include that of Meng, Rosenthal, & Rubin (1992). And see Zhou (2007) for methods of comparing correlations based on confidence intervals, in the spirit of the new statistics.

SPSS is not good (which is a polite way of saying bloody awful) for the comparison of correlations. To the best of my knowledge, no tests of the difference between correlations is implemented in the software. What's more, SPSS does not as standard provide confidence intervals on (although it can provide you with bootstrap confidence intervals). The cor.test function of the stats package in R does. SPSS deficiencies in this respect are probably why you see contrasts of r only rarely in psychological research. So that's another reason to switch to R.

When you have the Pearson's r and the standard deviations of the two variables, you can find the covariance with the following simple formula: r*sd1*sd2. Take for example, this paper Donellan, Ackerman & Brecheen (2016), where the first and second items of the Rosenberg Self-Esteem Scale have an inter-item Pearson's r of .701 (Table 1). The first item has a standard deviation of .823. The second item has a standard deviation of .736. All we have to do to find the covariance is .701 x .823 x .736. Which is .425 (3dp). To work out the Pearson's r when you only have covariances and SDs, use this formula: cov/(sd1*sd2). So: .425/(.823 x .736) = .702 (3dp), which deviates from the original r only due to the rounding off of the covariance from .4246153 to .425.

You can create datasets of continuous variables with pre-specified inter-variable Pearson's r really easily by using the Cholesky transformation in R. Which is fantastic for a dry run of your analysis before data collection. Check this blog out to learn how.

There is a society for the suppression of the correlation coefficient, "whose guiding principle is most correlation coefficients should never be calculated". The website for this society contains this great bibliography of articles related to their cause.

No comments:

Post a Comment