在R中用Lavaan包做验证性因子分析

今天教大家如何在R中做验证性因子分析，R中能做CFA的包不止一个，但lavaan是最常用的SEM包。lavaan是latent variable analysis的组合，从名称中我们就可以看出它是一个关于潜变量分析的工具。

lavaan的相关资料网站有：

lavaan: An R Package for Structural Equation Modeling

install.packages("lavaan")
library(lavaan) # 安装并载入lavaan包

Lavaan包中有个数据集叫HolzingerSwineford1939，本次验证性因子分析的数据就在这个数据集中，以下是关于这个数据集的描述：

The classic Holzinger and Swineford (1939) dataset consists of mental ability test scores of seventh- and eighth-grade children from two different schools (Pasteur and Grant-White). In the original dataset (available in the MBESS package), there are scores for 26 tests. However, a smaller subset with 9 variables is more widely used in the literature (for example in Joreskog’s 1969 paper, which also uses the 145 subjects from the Grant-White school only).

在安装lavaan包后，简要查看下数据集中的数据大致是个什么样子

head(HolzingerSwineford1939)
  id sex ageyr agemo  school grade       x1   x2    x3       x4   x5        x6       x7   x8       x9
1  1   1    13     1 Pasteur     7 3.333333 7.75 0.375 2.333333 5.75 1.2857143 3.391304 5.75 6.361111
2  2   2    13     7 Pasteur     7 5.333333 5.25 2.125 1.666667 3.00 1.2857143 3.782609 6.25 7.916667
3  3   2    13     1 Pasteur     7 4.500000 5.25 1.875 1.000000 1.75 0.4285714 3.260870 3.90 4.416667
4  4   1    13     2 Pasteur     7 5.333333 7.75 3.000 2.666667 4.50 2.4285714 3.000000 5.30 4.861111
5  5   2    12     2 Pasteur     7 4.833333 4.75 0.875 2.666667 4.00 2.5714286 3.695652 6.30 5.916667
6  6   2    14     1 Pasteur     7 5.333333 5.00 2.250 1.000000 3.00 0.8571429 4.347826 6.65 7.500000

查看数据集的结构，数据集中包含15个变量variables，共301个观察值obs，5个整数型变量int，1个因子型变量factor，9个数值型变量num。

str(HolzingerSwineford1939)
'data.frame':	301 obs. of  15 variables:
 $ id    : int  1 2 3 4 5 6 7 8 9 11 ...
 $ sex   : int  1 2 2 1 2 2 1 2 2 2 ...
 $ ageyr : int  13 13 13 13 12 14 12 12 13 12 ...
 $ agemo : int  1 7 1 2 2 1 1 2 0 5 ...
 $ school: Factor w/ 2 levels "Grant-White",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ grade : int  7 7 7 7 7 7 7 7 7 7 ...
 $ x1    : num  3.33 5.33 4.5 5.33 4.83 ...
 $ x2    : num  7.75 5.25 5.25 7.75 4.75 5 6 6.25 5.75 5.25 ...
 $ x3    : num  0.375 2.125 1.875 3 0.875 ...
 $ x4    : num  2.33 1.67 1 2.67 2.67 ...
 $ x5    : num  5.75 3 1.75 4.5 4 3 6 4.25 5.75 5 ...
 $ x6    : num  1.286 1.286 0.429 2.429 2.571 ...
 $ x7    : num  3.39 3.78 3.26 3 3.7 ...
 $ x8    : num  5.75 6.25 3.9 5.3 6.3 6.65 6.2 5.15 4.65 4.55 ...
 $ x9    : num  6.36 7.92 4.42 4.86 5.92 ...

一共3个潜变量，分别为visual, textual, speed

visual包含3个观察变量，x1,x2,x3

textual包含3个观察变量，x4,x5,x6

speed包含3个观察变量，x7,x8,x9

划重点了，=~符号左边是潜变量，右边是指标

latent variable =~ indicator1 + indicator2 + indicator3

我们按照上述格式，列出本例中3个潜变量的组成情况

visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9

然后组建模型，我们将这个模型命名为HS.model

HS.model <- ' visual  =~ x1 + x2 + x3 
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

下一步用cfa命令做验证性因子分析，cfa即为验证性因子分析的英文缩写，所用数据为HolzingerSwineford1939，并将cfa的结果赋值给fit

fit <- cfa(HS.model, data=HolzingerSwineford1939)

用summary命令查看fit结果，fit.measures = TRUE表示我们需要拟合度指标，默认不给出。

summary(fit, fit.measures=TRUE)
lavaan (0.6-1) converged normally after  35 iterations

  Number of observations                           301

  Estimator                                         ML
  Model Fit Test Statistic                      85.306
  Degrees of freedom                                24
  P-value (Chi-square)                           0.000

Model test baseline model:

  Minimum Function Test Statistic              918.852
  Degrees of freedom                                36
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.931
  Tucker-Lewis Index (TLI)                       0.896

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -3737.745
  Loglikelihood unrestricted model (H1)      -3695.092

  Number of free parameters                         21
  Akaike (AIC)                                7517.490
  Bayesian (BIC)                              7595.339
  Sample-size adjusted Bayesian (BIC)         7528.739

Root Mean Square Error of Approximation:

  RMSEA                                          0.092
  90 Percent Confidence Interval          0.071  0.114
  P-value RMSEA <= 0.05                          0.001

Standardized Root Mean Square Residual:

  SRMR                                           0.065

Parameter Estimates:

  Information                                 Expected
  Information saturated (h1) model          Structured
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.554    0.100    5.554    0.000
    x3                0.729    0.109    6.685    0.000
  textual =~                                          
    x4                1.000                           
    x5                1.113    0.065   17.014    0.000
    x6                0.926    0.055   16.703    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.180    0.165    7.152    0.000
    x9                1.082    0.151    7.155    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.408    0.074    5.552    0.000
    speed             0.262    0.056    4.660    0.000
  textual ~~                                          
    speed             0.173    0.049    3.518    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.549    0.114    4.833    0.000
   .x2                1.134    0.102   11.146    0.000
   .x3                0.844    0.091    9.317    0.000
   .x4                0.371    0.048    7.779    0.000
   .x5                0.446    0.058    7.642    0.000
   .x6                0.356    0.043    8.277    0.000
   .x7                0.799    0.081    9.823    0.000
   .x8                0.488    0.074    6.573    0.000
   .x9                0.566    0.071    8.003    0.000
    visual            0.809    0.145    5.564    0.000
    textual           0.979    0.112    8.737    0.000
    speed             0.384    0.086    4.451    0.000

结果中包含了拟合指标，指标载荷，协方差和方差等信息，但我们注意到题项载荷那一块有超过1的数值，说明这是未标准化的。鉴于文章大多要报告标准化的数据，那么如何才能得到标准化的题项载荷呢？

我们需要在cfa命令中加入http://std.lv = TRUE，然后再查看fit，此时便得到了标准化后的题项载荷。

fit <- cfa(HS.model, data=HolzingerSwineford1939, std.lv=T)
summary(fit, fit.measures=TRUE)

lavaan (0.6-1) converged normally after  22 iterations

  Number of observations                           301

  Estimator                                         ML
  Model Fit Test Statistic                      85.306
  Degrees of freedom                                24
  P-value (Chi-square)                           0.000

Model test baseline model:

  Minimum Function Test Statistic              918.852
  Degrees of freedom                                36
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.931
  Tucker-Lewis Index (TLI)                       0.896

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -3737.745
  Loglikelihood unrestricted model (H1)      -3695.092

  Number of free parameters                         21
  Akaike (AIC)                                7517.490
  Bayesian (BIC)                              7595.339
  Sample-size adjusted Bayesian (BIC)         7528.739

Root Mean Square Error of Approximation:

  RMSEA                                          0.092
  90 Percent Confidence Interval          0.071  0.114
  P-value RMSEA <= 0.05                          0.001

Standardized Root Mean Square Residual:

  SRMR                                           0.065

Parameter Estimates:

  Information                                 Expected
  Information saturated (h1) model          Structured
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                0.900    0.081   11.127    0.000
    x2                0.498    0.077    6.429    0.000
    x3                0.656    0.074    8.817    0.000
  textual =~                                          
    x4                0.990    0.057   17.474    0.000
    x5                1.102    0.063   17.576    0.000
    x6                0.917    0.054   17.082    0.000
  speed =~                                            
    x7                0.619    0.070    8.903    0.000
    x8                0.731    0.066   11.090    0.000
    x9                0.670    0.065   10.305    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.459    0.064    7.189    0.000
    speed             0.471    0.073    6.461    0.000
  textual ~~                                          
    speed             0.283    0.069    4.117    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.549    0.114    4.833    0.000
   .x2                1.134    0.102   11.146    0.000
   .x3                0.844    0.091    9.317    0.000
   .x4                0.371    0.048    7.778    0.000
   .x5                0.446    0.058    7.642    0.000
   .x6                0.356    0.043    8.277    0.000
   .x7                0.799    0.081    9.823    0.000
   .x8                0.488    0.074    6.573    0.000
   .x9                0.566    0.071    8.003    0.000
    visual            1.000                           
    textual           1.000                           
    speed             1.000

参考文献：

Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. URL http://www.jstatsoft.org/v48/i02/

本文来自zhihu，观点不代表一起大数据-技术文章心得立场，如若转载，请注明出处：https://zhuanlan.zhihu.com/p/38233209

更多内容请访问：IT源点

注意：本文归作者所有，未经作者允许，不得转载

在R中用Lavaan包做验证性因子分析

全部评论: 0 条

本文目录

热门标签

程序员导航

热门文章

阿里云新老用户最新优惠

最新发布

最新评论