Wednesday, 22 February 2017

Behind the Scenes at the Cronbach's Alpha: Variance and Covariance Methods in R

If you can't be bothered to read all of this: alpha is just total inter-item covariance divided by total score variance, multiplied by the number of items in the scale divided by the number of items in the scale minus one. So don't hang your hat on it.

Cronbach's alpha (Cronbach, 1951) needs little introduction. It's ubiquitous in personality and social psychology, as well as in neighboring disciplines. You'll see a Cronbach's alpha on most of the occasions that a multi-item self-report scale is used.

Now Cronbach's alpha first came on the scene in the early 1950s (well, technically it's a modification of pre-existing formulas, so perhaps that's a little imprecise, but whatever). Which is an era before the widespread availability of computers. Which means that the math(s) is fairly simple. Which means it can be done by hand (or by a string of simple transparent steps in R) quite easily. Because we're so used to SPSS or SAS or some other statistical program doing it for us, some might not be aware of that. I certainly wasn't until quite recently.

Alpha was designed as an index of internal consistency reliability, which is not the same thing as internal consistency or reliability per se (depending of course how you're defining those terms), although it's often interpreted as one of the two or both (which it may well be, depending on how you're defining those terms). Although it is described in a number of ways in the psychometric literature, the mathematical operation is unequivocal. In short, it tells us about the ratio of inter-item co-variance (the extent to which responses to statements or questions contained in a scale are linearly related in a sample) to total score variance (the extent to which total scores are dissimilar in a sample), taking in to account the scale length.

Because of the existence of simple associations between variances and covariances and actually even pearson's rs in psychometric data, Cronbach's alpha can be expressed in a variety of different ways. You get exactly the same result with both of the following equations, as the code at the bottom of this blog testifies to:


Where V is variance, C is covariance, and n is the number of items in the scale (not sample size, alpha his little directly to do with sample size). Note that this is the population variance, not sample variance.

I have created an R script that lets you see behind the scenes, when Cronbach's alpha is calculated using both formulas. The script is set up for an analysis of a hypothetical five item scale with hypothetical data (download the .csv here) but can be easily modified for others. All you need to do is enter your data as a .csv file and enter the number of items in the scale.

Entering the number of items in the scale manually has the added bonus of showing how alpha changes with scale length, even when the item covariance/total score variance (equation 2) or item variance/total score variance (equation 1) stays the same. This demonstrates how gross features of the covariance-variance matrix can be identical, but differential scale length results in different alpha coefficients.

Below is the equation in the SPSS documentation. Slightly different notation is used, but the equation corresponds to the first expression of alpha in terms of summed variances of items.


No comments:

Post a Comment