Saturday 18 March 2017

Joreskog (1971): Play Along with Example 1 in AMOS


Joreskog's paper Statistical Analysis of Sets of Congeneric Tests was, it is probably fair to say, a watershed moment in psychometrics.

In it, Joreskog reported on how he had been able to employ the "basic equations of the factor analytic model with one common factor" and combine it with the "maximum likelihood method" of estimation, to allow for a "statistical test of the assumption that the tests [indicators] are congeneric" (p. 112), based on "the testing of goodness of fit by the likelihood ratio technique" (p. 109).

Forget about that strange word "congeneric" for a moment. What is important to know is that this is the moment that factor analysis meets statistical significance testing. And if there were two things that psychologists liked at the time it was factor analysis and significance testing. It was always going to be a winning combination, and it went on to be called confirmatory factor analysis - you've probably heard of it.

Anyway. Joreskog goes on to apply his technique to some data pertaining to marks for four essays. The following table appears on page 114:



The first table (1a) is the covariance-variance matrix for the scores, the second table (1b) is the model fit for the parallel, tau-equivalent and congeneric models. The third I'm not too worried about.

The congeneric model is just the common factor model. The tau-equivalent model is the common factor model plus item factor loadings constrained to be equal. The parallel model is the common factor model plus item factor loadings and item variances constrained to be equal.

Looking at it from the perspective of the covariance-variance matrix, the tau-equivalent model dictates that covariances are equal (p. 113), and the parallel model says that covariances and variances are equal (p.113). These are strict demands. The congeneric model involves neither of these strict demands.

I have converted the covariance-variance matrix from Table 1a to a correlation matrix, so that you can run the analysis in AMOS. The correlation matrix looks like this:


And download the .sav file here from my dropbox.

What you need to do in AMOS is to create 3 models: (1) the normal single factor model (test of congeneric-ness), (2) a single factor model where factor loadings are constrained to be equal (tests of tau-equivalence), and (3) a single factor model where factor loading and variance are constrained to be equal (test of parallel-ness).

The first model is no problem - just draw a single-factor model how you usually do. For the second model (tau-equivalence), you first need to name parameters and then go to Analyze -> Manage Models. Set factor loadings to be equal like so:


For the third model (parallel), you need to set factor loadings and variances to be equal like so:


Now run the analysis with a normal Maximum likelihood estimation procedure. You will get the same (almost the same) results as Joreskog did back in the 70s. I imagine figures differ subtly due to rounding in the original paper. We can now also can get fit indexes (e.g. CFI, RMSEA) that have been created since Joreskog's paper was published.







Friday 10 March 2017

Ten Things You Didn't Know About Pearson's r (or Knew but are Worth Thinking About Again)



From the Sheffield Evening Telegraph, August 1912


You'd need an N of 38,416 for r = .01 to be statistically significant at p < .05. So get recruiting.

You'd need an N  of 5 for r = .90 to be statistically significant, p <.05. So pick a strong effect.

It's (co)developer, Karl Pearson was born in Islington, North London. He was a polymath (which is a posh word for bloody brilliant at everything), who made important contributions in statistics, philosophy, meteorology and evolutionary theory. It's not all good though. He was described as "probably the most dangerous man in England outside parliament" in the Sheffield Evening Telegraph in August 1912. Not for his work on correlation, I should add, but for his attitude toward eugenics. Anyway, moving swiftly on...

According to Rogers and Nicewander (1988), there are 13 ways to look at it.

There at least five methods of creating confidence intervals on meta-analysed Pearson's rs.

There are a number of ways to test the difference of independent correlations (coefficients from separate samples) and dependent correlations (coefficients from the same sample). The test of dependent correlations implemented in the "psych" R package is the T2 formula appearing in Steiger (1980; Equation 7). Other methods of testing the difference and heterogeneity of dependent correlations include that of Meng, Rosenthal, & Rubin (1992). And see Zhou (2007) for methods of comparing correlations based on confidence intervals, in the spirit of the new statistics.

SPSS is not good (which is a polite way of saying bloody awful) for the comparison of correlations. To the best of my knowledge, no tests of the difference between correlations is implemented in the software. What's more, SPSS does not as standard provide confidence intervals on (although it can provide you with bootstrap confidence intervals). The cor.test function of the stats package in R does. SPSS deficiencies in this respect are probably why you see contrasts of r only rarely in psychological research. So that's another reason to switch to R.

When you have the Pearson's r and the standard deviations of the two variables, you can find the covariance with the following simple formula: r*sd1*sd2. Take for example, this paper Donellan, Ackerman & Brecheen (2016), where the first and second items of the Rosenberg Self-Esteem Scale have an inter-item Pearson's r of .701 (Table 1). The first item has a standard deviation of .823. The second item has a standard deviation of .736. All we have to do to find the covariance is .701 x .823 x .736. Which is .425 (3dp). To work out the Pearson's r when you only have covariances and SDs, use this formula: cov/(sd1*sd2). So: .425/(.823 x .736) = .702 (3dp), which deviates from the original r only due to the rounding off of the covariance from .4246153 to .425.

You can create datasets of continuous variables with pre-specified inter-variable Pearson's r really easily by using the Cholesky transformation in R. Which is fantastic for a dry run of your analysis before data collection. Check this blog out to learn how.

There is a society for the suppression of the correlation coefficient, "whose guiding principle is most correlation coefficients should never be calculated". The website for this society contains this great bibliography of articles related to their cause.