5.3 At-Home Exercises
This week, we will wrap up our re-analysis of the Kestilä (2006) results. During this practical, you will conduct a CFA of the Trust in Politics items and compare the results to those obtained from your previous EFA- and PCA-based replications of Kestilä (2006).
5.3.1
Load the ESS data.
- The relevant data are contained in the ess_round1.rds file.
- This file is in R Data Set (RDS) format.
- The dataset is already stored as a data frame with the processing and cleaning that you should have done for previous practicals completed.
Although you may have settled on any number of EFA solutions during the Week 4 In-Class Exercises, we are going to base the following CFA on a three-factor model of Trust in Politics similar to the original PCA results from Kestilä (2006).
Note: Unless otherwise specified, all following questions refer to the Trust in Politics items. We will not consider the Attitudes toward Immigration items in these exercises.
5.3.2
Define the lavaan
model syntax for the CFA implied by the three-factor EFA
solution you found in the Week 4 In-Class Exercises.
- Covary the three latent factors.
- Do not specify any mean structure.
- Save this model syntax as an object in your environment.
Click to show code
mod_3f <- '
institutions =~ trstlgl + trstplc + trstun + trstep + trstprl
satisfaction =~ stfhlth + stfedu + stfeco + stfgov + stfdem
politicians =~ pltinvt + pltcare + trstplt
'
Click for explanation
We don’t have to specify the latent covariances in the model syntax, we can tell
lavaan
to estimate all latent covariances when we fit the model.
5.3.3
Estimate the CFA model you defined above, and summarize the results.
- Use the
lavaan::cfa()
function to estimate the model. - Use the default settings for the
cfa()
function. - Request the model fit statistics with the summary by supplying the
fit.measures = TRUE
argument tosummary()
. - Request the standardized parameter estimates with the summary by supplying the
standardized = TRUE
argument tosummary()
.
Check the results, and answer the following questions:
- Does the model fit the data well?
- How are the latent variances and covariances specified when using the default settings?
- How is the model identified when using the default settings?
Click the code
## Load the lavaan package:
library(lavaan)
## Estimate the CFA model:
fit_3f <- cfa(mod_3f, data = ess)
## Summarize the fitted model:
summary(fit_3f, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-19 ended normally after 46 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 29
##
## Used Total
## Number of observations 14778 19690
##
## Model Test User Model:
##
## Test statistic 10652.207
## Degrees of freedom 62
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 81699.096
## Degrees of freedom 78
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.870
## Tucker-Lewis Index (TLI) 0.837
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -371404.658
## Loglikelihood unrestricted model (H1) -366078.555
##
## Akaike (AIC) 742867.317
## Bayesian (BIC) 743087.743
## Sample-size adjusted Bayesian (SABIC) 742995.583
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.108
## 90 Percent confidence interval - lower 0.106
## 90 Percent confidence interval - upper 0.109
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.059
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## institutions =~
## trstlgl 1.000 1.613 0.677
## trstplc 0.770 0.012 61.866 0.000 1.241 0.567
## trstun 0.929 0.013 69.227 0.000 1.498 0.642
## trstep 0.908 0.013 70.929 0.000 1.464 0.660
## trstprl 1.139 0.014 84.084 0.000 1.837 0.809
## satisfaction =~
## stfhlth 1.000 1.173 0.521
## stfedu 1.106 0.022 50.840 0.000 1.297 0.577
## stfeco 1.415 0.025 57.214 0.000 1.659 0.713
## stfgov 1.480 0.025 58.764 0.000 1.736 0.756
## stfdem 1.384 0.024 57.904 0.000 1.623 0.731
## politicians =~
## pltinvt 1.000 0.646 0.613
## pltcare 1.021 0.016 62.862 0.000 0.660 0.628
## trstplt 3.012 0.039 76.838 0.000 1.946 0.891
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## institutions ~~
## satisfaction 1.391 0.032 43.206 0.000 0.736 0.736
## politicians 0.909 0.018 49.934 0.000 0.872 0.872
## satisfaction ~~
## politicians 0.539 0.013 41.053 0.000 0.711 0.711
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .trstlgl 3.068 0.041 75.262 0.000 3.068 0.541
## .trstplc 3.248 0.041 80.037 0.000 3.248 0.678
## .trstun 3.197 0.041 77.141 0.000 3.197 0.588
## .trstep 2.776 0.036 76.243 0.000 2.776 0.564
## .trstprl 1.776 0.029 61.361 0.000 1.776 0.345
## .stfhlth 3.695 0.046 79.989 0.000 3.695 0.729
## .stfedu 3.368 0.043 77.916 0.000 3.368 0.667
## .stfeco 2.656 0.038 69.070 0.000 2.656 0.491
## .stfgov 2.264 0.035 64.201 0.000 2.264 0.429
## .stfdem 2.289 0.034 67.172 0.000 2.289 0.465
## .pltinvt 0.694 0.009 78.255 0.000 0.694 0.624
## .pltcare 0.668 0.009 77.562 0.000 0.668 0.605
## .trstplt 0.978 0.028 34.461 0.000 0.978 0.205
## institutions 2.601 0.059 44.198 0.000 1.000 1.000
## satisfaction 1.375 0.044 31.407 0.000 1.000 1.000
## politicians 0.417 0.011 38.843 0.000 1.000 1.000
Click for explanation
No, the model does not seem to fit the data well.
- The SRMR looks good, but one good looking fit statistic is not enough.
- The RMSEA, TLI, and CFI are all in the “unacceptable” range.
- The \(\chi^2\) is highly significant, but we don’t care.
The cfa()
function is just a wrapper for the lavaan()
function with several
options set at the defaults you would want for a standard CFA.
- By default:
- All latent variances and covariances are freely estimated (due to the
argument
auto.cov.lv.x = TRUE
) - The model is identified by fixing the first factor loading of each factor
to 1 (due to the argument
auto.fix.first = TRUE
)
- All latent variances and covariances are freely estimated (due to the
argument
To see a full list of the (many) options you can specify to tweak the behavior
of lavaan
estimation functions run ?lavOptions
.
Now, we will consider a couple of alternative factor structures for the Trust in Politics CFA. First, we will go extremely simple by estimating a one-factor model wherein all Trust items are explained by a single latent variable.
5.3.4
Define the lavaan
model syntax for a one-factor model of the Trust items.
- Save this syntax as an object in your environment.
Click to show code
5.3.5
Estimate the one-factor model, and summarize the results.
- Does this model appear to fit better or worse than the three-factor model?
Note: You can use the lavaan::fitMeasures()
function to extract only the
model fit information from a fitted lavaan
object.
Click to show code
## Estimate the one factor model:
fit_1f <- cfa(mod_1f, data = ess)
## Summarize the results:
summary(fit_1f, fit.measures = TRUE)
## lavaan 0.6-19 ended normally after 33 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 26
##
## Used Total
## Number of observations 14778 19690
##
## Model Test User Model:
##
## Test statistic 17667.304
## Degrees of freedom 65
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 81699.096
## Degrees of freedom 78
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.784
## Tucker-Lewis Index (TLI) 0.741
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -374912.206
## Loglikelihood unrestricted model (H1) -366078.555
##
## Akaike (AIC) 749876.413
## Bayesian (BIC) 750074.036
## Sample-size adjusted Bayesian (SABIC) 749991.410
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.135
## 90 Percent confidence interval - lower 0.134
## 90 Percent confidence interval - upper 0.137
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.080
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## political_trust =~
## trstlgl 1.000
## trstplc 0.774 0.013 57.949 0.000
## trstun 0.930 0.014 64.200 0.000
## trstep 0.909 0.014 65.679 0.000
## trstprl 1.182 0.015 79.401 0.000
## stfhlth 0.615 0.013 45.947 0.000
## stfedu 0.695 0.014 51.424 0.000
## stfeco 0.895 0.014 62.316 0.000
## stfgov 0.985 0.014 68.200 0.000
## stfdem 0.998 0.014 70.899 0.000
## pltinvt 0.382 0.006 59.215 0.000
## pltcare 0.396 0.006 61.195 0.000
## trstplt 1.183 0.014 81.716 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .trstlgl 3.370 0.042 79.787 0.000
## .trstplc 3.410 0.041 82.311 0.000
## .trstun 3.451 0.043 80.749 0.000
## .trstep 3.019 0.038 80.272 0.000
## .trstprl 1.938 0.027 70.878 0.000
## .stfhlth 4.201 0.050 84.093 0.000
## .stfedu 3.941 0.047 83.419 0.000
## .stfeco 3.565 0.044 81.289 0.000
## .stfgov 3.044 0.038 79.326 0.000
## .stfdem 2.631 0.034 78.072 0.000
## .pltinvt 0.775 0.009 82.043 0.000
## .pltcare 0.743 0.009 81.579 0.000
## .trstplt 1.548 0.023 67.052 0.000
## political_trst 2.299 0.055 41.569 0.000
## npar fmin chisq
## 29.000 0.360 10652.207
## df pvalue baseline.chisq
## 62.000 0.000 81699.096
## baseline.df baseline.pvalue cfi
## 78.000 0.000 0.870
## tli nnfi rfi
## 0.837 0.837 0.836
## nfi pnfi ifi
## 0.870 0.691 0.870
## rni logl unrestricted.logl
## 0.870 -371404.658 -366078.555
## aic bic ntotal
## 742867.317 743087.743 14778.000
## bic2 rmsea rmsea.ci.lower
## 742995.583 0.108 0.106
## rmsea.ci.upper rmsea.ci.level rmsea.pvalue
## 0.109 0.900 0.000
## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0
## 0.050 1.000 0.080
## rmr rmr_nomean srmr
## 0.255 0.255 0.059
## srmr_bentler srmr_bentler_nomean crmr
## 0.059 0.059 0.064
## crmr_nomean srmr_mplus srmr_mplus_nomean
## 0.064 0.059 0.059
## cn_05 cn_01 gfi
## 113.901 126.971 0.897
## agfi pgfi mfi
## 0.849 0.611 0.699
## ecvi
## 0.725
## npar fmin chisq
## 26.000 0.598 17667.304
## df pvalue baseline.chisq
## 65.000 0.000 81699.096
## baseline.df baseline.pvalue cfi
## 78.000 0.000 0.784
## tli nnfi rfi
## 0.741 0.741 0.741
## nfi pnfi ifi
## 0.784 0.653 0.784
## rni logl unrestricted.logl
## 0.784 -374912.206 -366078.555
## aic bic ntotal
## 749876.413 750074.036 14778.000
## bic2 rmsea rmsea.ci.lower
## 749991.410 0.135 0.134
## rmsea.ci.upper rmsea.ci.level rmsea.pvalue
## 0.137 0.900 0.000
## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0
## 0.050 1.000 0.080
## rmr rmr_nomean srmr
## 0.364 0.364 0.080
## srmr_bentler srmr_bentler_nomean crmr
## 0.080 0.080 0.087
## crmr_nomean srmr_mplus srmr_mplus_nomean
## 0.087 0.080 0.080
## cn_05 cn_01 gfi
## 71.949 79.980 0.825
## agfi pgfi mfi
## 0.756 0.590 0.551
## ecvi
## 1.199
Click for explanation
The one-factor model definitely seems to fit worse than the three-factor model.
A second order CFA model is another way of representing the latent structure underlying a set of items. As you read in Byrne (2005), however, the second order CFA is only appropriate in certain circumstances.
5.3.6
Given the CFA results above, would a second order CFA be appropriate for the Trust data? Why or why not?
Click for explanation
Yes, a second order CFA model is a theoretically appropriate representation of the Trust items.
- The first order latent variables in the three-factor model are all significantly correlated.
- The first order latent variables in the three-factor model seem to tap different aspects of some single underlying construct.
5.3.7
Define the lavaan
model syntax for a second-order CFA model of the Trust items.
- Use the three factors defined in 5.3.2 as the first order factors.
Click to show code
mod_2nd <- '
institutions =~ trstlgl + trstplc + trstun + trstep + trstprl
satisfaction =~ stfhlth + stfedu + stfeco + stfgov + stfdem
politicians =~ pltinvt + pltcare + trstplt
trust =~ politicians + satisfaction + institutions
'
Click for explanation
To define the second order factor, we use the same syntactic conventions that we
employ to define a first order factor. The only differences is that the
“indicators” of the second order factor (i.e., the variables listed on the RHS
of the =~
operator) are previously defined first order latent variables.
5.3.8
Estimate the second order CFA model, and summarize the results.
- Does this model fit better or worse than the three-factor model?
- Is this model more or less complex than the three-factor model?
- What information can you use to quantify this difference in complexity?
Click to show code
## lavaan 0.6-19 ended normally after 44 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 29
##
## Used Total
## Number of observations 14778 19690
##
## Model Test User Model:
##
## Test statistic 10652.207
## Degrees of freedom 62
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 81699.096
## Degrees of freedom 78
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.870
## Tucker-Lewis Index (TLI) 0.837
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -371404.658
## Loglikelihood unrestricted model (H1) -366078.555
##
## Akaike (AIC) 742867.317
## Bayesian (BIC) 743087.743
## Sample-size adjusted Bayesian (SABIC) 742995.583
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.108
## 90 Percent confidence interval - lower 0.106
## 90 Percent confidence interval - upper 0.109
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.059
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## institutions =~
## trstlgl 1.000 1.613 0.677
## trstplc 0.770 0.012 61.866 0.000 1.241 0.567
## trstun 0.929 0.013 69.227 0.000 1.498 0.642
## trstep 0.908 0.013 70.929 0.000 1.464 0.660
## trstprl 1.139 0.014 84.084 0.000 1.837 0.809
## satisfaction =~
## stfhlth 1.000 1.173 0.521
## stfedu 1.106 0.022 50.840 0.000 1.297 0.577
## stfeco 1.415 0.025 57.214 0.000 1.659 0.713
## stfgov 1.480 0.025 58.764 0.000 1.736 0.756
## stfdem 1.384 0.024 57.904 0.000 1.623 0.731
## politicians =~
## pltinvt 1.000 0.646 0.613
## pltcare 1.021 0.016 62.862 0.000 0.660 0.628
## trstplt 3.012 0.039 76.838 0.000 1.946 0.891
## trust =~
## politicians 1.000 0.918 0.918
## satisfaction 1.531 0.033 46.494 0.000 0.774 0.774
## institutions 2.583 0.045 56.796 0.000 0.950 0.950
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .trstlgl 3.068 0.041 75.262 0.000 3.068 0.541
## .trstplc 3.248 0.041 80.037 0.000 3.248 0.678
## .trstun 3.197 0.041 77.141 0.000 3.197 0.588
## .trstep 2.776 0.036 76.243 0.000 2.776 0.564
## .trstprl 1.776 0.029 61.361 0.000 1.776 0.345
## .stfhlth 3.695 0.046 79.989 0.000 3.695 0.729
## .stfedu 3.368 0.043 77.916 0.000 3.368 0.667
## .stfeco 2.656 0.038 69.070 0.000 2.656 0.491
## .stfgov 2.264 0.035 64.201 0.000 2.264 0.429
## .stfdem 2.289 0.034 67.172 0.000 2.289 0.465
## .pltinvt 0.694 0.009 78.255 0.000 0.694 0.624
## .pltcare 0.668 0.009 77.562 0.000 0.668 0.605
## .trstplt 0.978 0.028 34.461 0.000 0.978 0.205
## .institutions 0.255 0.022 11.691 0.000 0.098 0.098
## .satisfaction 0.551 0.020 27.846 0.000 0.400 0.400
## .politicians 0.065 0.004 17.091 0.000 0.157 0.157
## trust 0.352 0.010 35.005 0.000 1.000 1.000
## npar fmin chisq
## 29.000 0.360 10652.207
## df pvalue baseline.chisq
## 62.000 0.000 81699.096
## baseline.df baseline.pvalue cfi
## 78.000 0.000 0.870
## tli nnfi rfi
## 0.837 0.837 0.836
## nfi pnfi ifi
## 0.870 0.691 0.870
## rni logl unrestricted.logl
## 0.870 -371404.658 -366078.555
## aic bic ntotal
## 742867.317 743087.743 14778.000
## bic2 rmsea rmsea.ci.lower
## 742995.583 0.108 0.106
## rmsea.ci.upper rmsea.ci.level rmsea.pvalue
## 0.109 0.900 0.000
## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0
## 0.050 1.000 0.080
## rmr rmr_nomean srmr
## 0.255 0.255 0.059
## srmr_bentler srmr_bentler_nomean crmr
## 0.059 0.059 0.064
## crmr_nomean srmr_mplus srmr_mplus_nomean
## 0.064 0.059 0.059
## cn_05 cn_01 gfi
## 113.901 126.971 0.897
## agfi pgfi mfi
## 0.849 0.611 0.699
## ecvi
## 0.725
## npar fmin chisq
## 29.000 0.360 10652.207
## df pvalue baseline.chisq
## 62.000 0.000 81699.096
## baseline.df baseline.pvalue cfi
## 78.000 0.000 0.870
## tli nnfi rfi
## 0.837 0.837 0.836
## nfi pnfi ifi
## 0.870 0.691 0.870
## rni logl unrestricted.logl
## 0.870 -371404.658 -366078.555
## aic bic ntotal
## 742867.317 743087.743 14778.000
## bic2 rmsea rmsea.ci.lower
## 742995.583 0.108 0.106
## rmsea.ci.upper rmsea.ci.level rmsea.pvalue
## 0.109 0.900 0.000
## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0
## 0.050 1.000 0.080
## rmr rmr_nomean srmr
## 0.255 0.255 0.059
## srmr_bentler srmr_bentler_nomean crmr
## 0.059 0.059 0.064
## crmr_nomean srmr_mplus srmr_mplus_nomean
## 0.064 0.059 0.059
## cn_05 cn_01 gfi
## 113.901 126.971 0.897
## agfi pgfi mfi
## 0.849 0.611 0.699
## ecvi
## 0.725
Click for explanation
We don’t have to do anything special here. We can estimate and summarize the second order CFA exactly as we did the first order CFA.
You should quickly notice something strange about the model fit statistics compared above. If you don’t see it, consider the following:
## npar fmin chisq
## 0 0 0
## df pvalue baseline.chisq
## 0 0 0
## baseline.df baseline.pvalue cfi
## 0 0 0
## tli nnfi rfi
## 0 0 0
## nfi pnfi ifi
## 0 0 0
## rni logl unrestricted.logl
## 0 0 0
## aic bic ntotal
## 0 0 0
## bic2 rmsea rmsea.ci.lower
## 0 0 0
## rmsea.ci.upper rmsea.ci.level rmsea.pvalue
## 0 0 0
## rmsea.close.h0 rmsea.notclose.pvalue rmsea.notclose.h0
## 0 0 0
## rmr rmr_nomean srmr
## 0 0 0
## srmr_bentler srmr_bentler_nomean crmr
## 0 0 0
## crmr_nomean srmr_mplus srmr_mplus_nomean
## 0 0 0
## cn_05 cn_01 gfi
## 0 0 0
## agfi pgfi mfi
## 0 0 0
## ecvi
## 0
The two models produce identical fit statistics! We also see that the degrees of freedom are identical between the two models. Hence, the two models have equal complexity.
This result taps into a critical idea in statistical modeling, namely, model equivalency. It turns out the two models we’re comparing here are equivalent in the sense that they are statistically indistinguishable representations of the data.
Since this is a very important idea, I want to spend some time discussing it in person. So, spend some time between now and the Week 6 lecture session thinking about the implications of this model equivalence. Specifically, consider the following questions:
- What do we mean when we say that these two models are equivalent?
- How is it possible for these two models to be equivalent when one contains an additional latent variable?
- Why are the degrees of freedom equal for these two models?
- Why are the fit statistics equal for these two models?
We’ll take some time to discuss these ideas in the Week 6 lecture session.
End of At-Home Exercises