Peg's Blog: Behind the Scenes at the Cronbach's Alpha: Variance and Covariance Methods in R

If you can't be bothered to read all of this: alpha is just total inter-item covariance divided by total score variance, multiplied by the number of items in the scale divided by the number of items in the scale minus one. So don't hang your hat on it.

Cronbach's alpha (Cronbach, 1951) needs little introduction. It's ubiquitous in personality and social psychology, as well as in neighboring disciplines. You'll see a Cronbach's alpha on most of the occasions that a multi-item self-report scale is used.

Now Cronbach's alpha first came on the scene in the early 1950s (well, technically it's a modification of pre-existing formulas, so perhaps that's a little imprecise, but whatever). Which is an era before the widespread availability of computers. Which means that the math(s) is fairly simple. Which means it can be done by hand (or by a string of simple transparent steps in R) quite easily. Because we're so used to SPSS or SAS or some other statistical program doing it for us, some might not be aware of that. I certainly wasn't until quite recently.

Alpha was designed as an index of internal consistency reliability, which is not the same thing as internal consistency or reliability per se (depending of course how you're defining those terms), although it's often interpreted as one of the two or both (which it may well be, depending on how you're defining those terms). Although it is described in a number of ways in the psychometric literature, the mathematical operation is unequivocal. In short, it tells us about the ratio of inter-item co-variance (the extent to which responses to statements or questions contained in a scale are linearly related in a sample) to total score variance (the extent to which total scores are dissimilar in a sample), taking in to account the scale length.

Because of the existence of simple associations between variances and covariances and actually even pearson's rs in psychometric data, Cronbach's alpha can be expressed in a variety of different ways. You get exactly the same result with both of the following equations, as the code at the bottom of this blog testifies to:

Where V is variance, C is covariance, and n is the number of items in the scale (not sample size, alpha his little directly to do with sample size). Note that this is the population variance, not sample variance.

I have created an R script that lets you see behind the scenes, when Cronbach's alpha is calculated using both formulas. The script is set up for an analysis of a hypothetical five item scale with hypothetical data (download the .csv here) but can be easily modified for others. All you need to do is enter your data as a .csv file and enter the number of items in the scale.

Entering the number of items in the scale manually has the added bonus of showing how alpha changes with scale length, even when the item covariance/total score variance (equation 2) or item variance/total score variance (equation 1) stays the same. This demonstrates how gross features of the covariance-variance matrix can be identical, but differential scale length results in different alpha coefficients.

Below is the equation in the SPSS documentation. Slightly different notation is used, but the equation corresponds to the first expression of alpha in terms of summed variances of items.

	#Under the Bonnet of Cronbach's alpha.

	#COVARIANCE METHOD*

	#n/(n-1)*(sumsumCij/Vt)

	#import csv
	df1 <- read.csv("Cron.csv")

	#create cov-var matrix: 5 variances on diagonal, 20 covariances in symettric matrix (10 above, 10 below diagonal)
	cvmatrx <- cov(df1)


	#put the rows and columns of cov matrix in to a vector
	cvmatrx2 <- cvmatrx[1:25]
	#and sum
	totalvar <- sum(cvmatrx2)
	#for total of covariance-variance matrix denominator
	#(which is equivalent to the sample variance of total test scores)


	#sum rows (10 values in 5*5 matrix)
	sumrows <- cvmatrx[2] + cvmatrx[3] + cvmatrx[4] + cvmatrx [5] + cvmatrx [8] + cvmatrx [9] + cvmatrx [10] + cvmatrx[14] + cvmatrx[15] + cvmatrx [20]

	#sum columns (10 values in 5*5 matrix)
	sumcols <- cvmatrx[6] + cvmatrx[11] + cvmatrx[12] + cvmatrx [16] + cvmatrx [17] + cvmatrx [18] + cvmatrx [21] + cvmatrx[22] + cvmatrx[23] + cvmatrx [24]

	#check both figures are the same

	#and sum
	numerator <- sumrows+sumcols

	#find ratio of inter-item covariance to total test score variance
	ratio <- (numerator/totalvar)

	#enter number of items
	n <- 5 #enter here. Note n is number of items not sample size in Cronbach's notation
	#Change the number of items to see how under the same covariance conditions, Cronbach's alpha "improves"
	#as scale length increases

	#create multiplier
	multiplier <- n/(n-1)

	#Find alpha ITEM COVARIANCE METHOD
	cronbachalpha <- multiplier*(numerator/totalvar)
	#with .csv given = .334821




	#VARIANCE METHOD
	#(*population variance* not sample variance - R var() in stats package is sample variance and has no
	#population variance function. Why?!)

	#get item variance
	itemvar <- var(df1$X1) + var(df1$X2) + var(df1$X3) + var(df1$X4) + var(df1$X5)

	#convert to population, differs by a factor of (n-1)/n (where n is sample size)
	#already have n, so nn will be sample size
	nn <- 10

	#get pop variance
	itemvarp <- itemvar*((nn-1)/nn)


	#get total score variance
	totalscores <- df1$X1 + df1$X2 + df1$X3 + df1$X4 + df1$X5

	#Get variance
	totalvar <- var(totalscores)

	#convert to population var
	totalvarp <- totalvar*((nn-1)/nn)


	#find item variance over total score variance, subtracted from 1
	ivtsv1 <- 1-(itemvarp/totalvarp)

	#find alpha
	cronbachalpha2 <- multiplier*ivtsv1
	#check the same

view raw Cronbach script 2102.R hosted with ❤ by GitHub

Peg's Blog

Wednesday, 22 February 2017

Behind the Scenes at the Cronbach's Alpha: Variance and Covariance Methods in R

No comments:

Post a Comment