top of page

A Brief Review of Correlation Coefficients

Updated: Dec 1, 2023

Faisal Awartani (Ph.D.)

When delving into the analysis of relationships between two variables (X, Y), various correlation coefficients come into play, each tailored to the nature of the variables under consideration. The renowned Pearson correlation coefficient (Rp) takes the spotlight as it gauges the strength of the linear relationship between two quantitative variables. With values ranging from -1 to +1, Rp becomes a powerful tool. An Rp of -1 signifies a perfect negative linear relationship, while an Rp of +1 indicates a perfect positive linear relationship. A value of 0 suggests the absence of linearity, paving the way for other potential relationships, though non-linear in nature. Moving into the realm of non-parametric correlation, the Spearman correlation coefficient (Rs) steps forward. Ideally suited for assessing the correlation between two ordinal or quantitative variables, Rs shares similarities with Pearson's coefficient. However, it distinguishes itself by utilizing the ranks of numbers rather than their actual values. Ranging from -1 to +1, Rs provides valuable insights into the relationships between the ranks of values of the various variables being assessed by such correlation coefficient. Kendall’s Tau (Ta), another non-parametric measure, evaluates the correlation between two quantitative or ordinal variables. By considering the concordance level between variable values rather than the values themselves, Ta offers a nuanced perspective.

For example if we have a bivariate data set with two variables (X,Y), where (X1, Y1), (X2,Y2), … , (Xn,Yn) represent the respective ordered pairs in the data set. For any two ordered pairs (Xi,Yi) , (Xj,Yj) , where I <J , are said to be concordant, if Xi < Xj, then Yi < Yj or when Xi > Xj then Yi > Yj. And the two pairs are said to be disconcordant if Xi<Xj then Yi>Yj or Xi>Xj and Yi<Yj. In the Xi<Xj and Yi=Yj, it’s considered a tie.

Concordant and discordant pairs play a key role in this evaluation, with the formula Ta = (Nc - Nd) / N, where Nc is the number of concordant pairs, Nd is the number of discordant pairs, and N is the total number of pairs. Kendall’s Tau has two variants, Tb and Tc, which account for tie values in the dataset. These variations enhance the accuracy of correlation measurements in situations where ties occur. When trying to assess the strength of the relationship between a quantitative and a qualitative variable, we usually use Eta as a measure for the strength of the relationship. Eta is usually produced when applying the ANOVA procedure for testing the significance of the significance of the relationship between the two variables where the dependent variable is quantitative and the independent variable is qualitative. Below is an example of conducting ANOVA to test the significance of the relationship between “Respondents Socioeconomic Status”, which is a quantitative variable and “Respondents Astrological Sign” which is qualitative. This example is produced from the general social survey conducted in the US in a yearly basis by the national opinion research center NORC in the university of Chicago. So Eta-square represents the “Variance Between Groups (explained variance)” divided by the “Total Variance”. Therefore, the Eta-Square you get in the below measures of association table from an spss output, will be Eta-Square=(Variance Between Groups)/(Total Variance)= 3024/493227=0.006 Eta=Sqrt(Eta-Square)=Sqrt(0.006)=0.078 When Eta is close to zero we say the correlation between the two variables is very weak and when Eta is close to 1 we say the correlation between the two variables is very strong.

Shifting focus to qualitative variables, Cramer’s V steps into the spotlight. Widely used to measure correlation between two qualitative variables, Cramer’s V is a standardized version of the Chi-Square value. It calculates the sum of square differences between observed and expected counts, assuming independence between the variables in a contingency table. If the two variables are actually independent then the difference between the observed and the expected counts, within the contingency table representing the relationship between the two variables, will be close to zero. Cramer’s V, scaling between 0 and 1, provides a standardized perspective on the Chi-Square value. A value of 0 indicates no correlation between the two qualitative variables, while 1 signifies a strong robust relationship. The formula for Cramer’s V is defined as the square root of (Chi-square/n) divided by the minimum of (columns-1, rows-1). In the intricate world of correlation coefficients, each type offers a unique lens through which to view the relationships between variables.

Top of Form

81 views0 comments


bottom of page