How to draw an ICC using base R functions from Mplus output

The goal of this post is to illustrate how to prepare an item characteristic curve (ICC) using base R functions (curve, plot). This post limits consideration to binary test items. Ordered categorical items are addressed in a subsequent post.

Along the way, this post will also cover

Libraries

# install.packages("scales")
#install.packages("naniar")
#install.packages("MplusAutomation")
library(scales) # to make curve lines transparent
library(naniar) # for missing data handling
library(MplusAutomation) # for Mplus pre and post processing
## Version:  0.8
## We work hard to write this free software. Please help us get credit by citing: 
## 
## Hallquist, M. N. & Wiley, J. F. (2018). MplusAutomation: An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus. Structural Equation Modeling, 25, 621-638. doi: 10.1080/10705511.2017.1402334.
## 
## -- see citation("MplusAutomation").

Data

I will use data from example ex0200.dat provided at lvmworkshop.org

# read the dat file from the LVM Workshop web page
# and save as data frame object
df_ex0200 <- read.csv("https://s3.amazonaws.com/lvmworkshop/Meas/EX0200.dat")
# name the columns
colnames(df_ex0200) <- c("y1","y2","y3","y4","y5","y6")
# replace the coded-for Mplus missing values (-9999) with
# more R-useful NA
df_ex0200 <- naniar::replace_with_na_all(df_ex0200, condition = ~.x == -9999)

Mplus CFA model using WLSMV estimator, theta parameterization

MplusAutomation::prepareMplusData(df_ex0200,"ex0200.dat",interactive="FALSE")
## The file(s)
##  'ex0200.dat' 
## currently exist(s) and will be overwritten
## TITLE: Your title goes here
## DATA: FILE = "ex0200.dat";
## VARIABLE: 
## NAMES = y1 y2 y3 y4 y5 y6; 
## MISSING=.;

In the code chunk below, I use the knitr Mplus engine to create and execute an Mplus model.

TITLE:     CFA model of 6 SPSMQ items in EPESE
DATA:      FILE = ex0200.dat ; 
VARIABLE:  NAMES = y1 y2 y3 y4 y5 y6; 
           CATEGORICAL = y1 y2 y3 y4 y5 y6 ;
           MISSING = . ; 
ANALYSIS:  ESTIMATOR = WLSMV ;
           PARAMETERIZATION = THETA ;
MODEL:     cog by y1-y6*;
           cog @ 1 ;
## Mplus VERSION 8.6 (Mac)
## MUTHEN & MUTHEN
## 07/19/2021   1:29 PM
## 
## INPUT INSTRUCTIONS
## 
##   TITLE:     CFA model of 6 SPSMQ items in EPESE
##   DATA:      FILE = ex0200.dat ;
##   VARIABLE:  NAMES = y1 y2 y3 y4 y5 y6;
##              CATEGORICAL = y1 y2 y3 y4 y5 y6 ;
##              MISSING = . ;
##   ANALYSIS:  ESTIMATOR = WLSMV ;
##              PARAMETERIZATION = THETA ;
##   MODEL:     cog by y1-y6*;
##              cog @ 1 ;
## 
## 
## 
## *** WARNING
##   Data set contains cases with missing on all variables.
##   These cases were not included in the analysis.
##   Number of cases with missing on all variables:  637
##    1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
## 
## 
## 
## CFA model of 6 SPSMQ items in EPESE
## 
## SUMMARY OF ANALYSIS
## 
## Number of groups                                                 1
## Number of observations                                       13519
## 
## Number of dependent variables                                    6
## Number of independent variables                                  0
## Number of continuous latent variables                            1
## 
## Observed dependent variables
## 
##   Binary and ordered categorical (ordinal)
##    Y1          Y2          Y3          Y4          Y5          Y6
## 
## Continuous latent variables
##    COG
## 
## 
## Estimator                                                    WLSMV
## Maximum number of iterations                                  1000
## Convergence criterion                                    0.500D-04
## Maximum number of steepest descent iterations                   20
## Maximum number of iterations for H1                           2000
## Convergence criterion for H1                             0.100D-03
## Parameterization                                             THETA
## Link                                                        PROBIT
## 
## Input data file(s)
##   ex0200.dat
## 
## Input data format  FREE
## 
## 
## SUMMARY OF DATA
## 
##      Number of missing data patterns            26
## 
## 
## COVARIANCE COVERAGE OF DATA
## 
## Minimum covariance coverage value   0.100
## 
## 
##      PROPORTION OF DATA PRESENT
## 
## 
##            Covariance Coverage
##               Y1            Y2            Y3            Y4            Y5
##               ________      ________      ________      ________      ________
##  Y1             0.998
##  Y2             0.997         0.998
##  Y3             0.996         0.995         0.997
##  Y4             0.994         0.994         0.994         0.995
##  Y5             0.992         0.992         0.991         0.990         0.994
##  Y6             0.945         0.945         0.945         0.945         0.942
## 
## 
##            Covariance Coverage
##               Y6
##               ________
##  Y6             0.946
## 
## 
## UNIVARIATE PROPORTIONS AND COUNTS FOR CATEGORICAL VARIABLES
## 
##     Y1
##       Category 1    0.719         9702.000
##       Category 2    0.281         3793.000
##     Y2
##       Category 1    0.946        12763.000
##       Category 2    0.054          725.000
##     Y3
##       Category 1    0.913        12308.000
##       Category 2    0.087         1168.000
##     Y4
##       Category 1    0.689         9271.000
##       Category 2    0.311         4187.000
##     Y5
##       Category 1    0.963        12937.000
##       Category 2    0.037          501.000
##     Y6
##       Category 1    0.604         7730.000
##       Category 2    0.396         5064.000
## 
## 
## 
## THE MODEL ESTIMATION TERMINATED NORMALLY
## 
## 
## 
## MODEL FIT INFORMATION
## 
## Number of Free Parameters                       12
## 
## Chi-Square Test of Model Fit
## 
##           Value                            337.500*
##           Degrees of Freedom                     9
##           P-Value                           0.0000
## 
## *   The chi-square value for MLM, MLMV, MLR, ULSMV, WLSM and WLSMV cannot be used
##     for chi-square difference testing in the regular way.  MLM, MLR and WLSM
##     chi-square difference testing is described on the Mplus website.  MLMV, WLSMV,
##     and ULSMV difference testing is done using the DIFFTEST option.
## 
## RMSEA (Root Mean Square Error Of Approximation)
## 
##           Estimate                           0.052
##           90 Percent C.I.                    0.047  0.057
##           Probability RMSEA <= .05           0.240
## 
## CFI/TLI
## 
##           CFI                                0.955
##           TLI                                0.925
## 
## Chi-Square Test of Model Fit for the Baseline Model
## 
##           Value                           7276.590
##           Degrees of Freedom                    15
##           P-Value                           0.0000
## 
## SRMR (Standardized Root Mean Square Residual)
## 
##           Value                              0.052
## 
## Optimum Function Value for Weighted Least-Squares Estimator
## 
##           Value                     0.84220603D-02
## 
## 
## 
## MODEL RESULTS
## 
##                                                     Two-Tailed
##                     Estimate       S.E.  Est./S.E.    P-Value
## 
##  COG      BY
##     Y1                 0.731      0.023     31.197      0.000
##     Y2                 0.898      0.047     19.076      0.000
##     Y3                 1.901      0.146     12.994      0.000
##     Y4                 0.817      0.025     32.796      0.000
##     Y5                 0.566      0.038     14.850      0.000
##     Y6                 0.702      0.023     30.142      0.000
## 
##  Thresholds
##     Y1$1               0.718      0.016     44.113      0.000
##     Y2$1               2.163      0.054     40.163      0.000
##     Y3$1               2.925      0.178     16.386      0.000
##     Y4$1               0.636      0.017     38.311      0.000
##     Y5$1               2.049      0.039     52.734      0.000
##     Y6$1               0.323      0.014     22.739      0.000
## 
##  Variances
##     COG                1.000      0.000    999.000    999.000
## 
## 
## QUALITY OF NUMERICAL RESULTS
## 
##      Condition Number for the Information Matrix              0.445E-02
##        (ratio of smallest to largest eigenvalue)
## 
## 
## IRT PARAMETERIZATION
## 
##                                                     Two-Tailed
##                     Estimate       S.E.  Est./S.E.    P-Value
## 
##  Item Discriminations
## 
##  COG      BY
##     Y1                 0.731      0.023     31.197      0.000
##     Y2                 0.898      0.047     19.076      0.000
##     Y3                 1.901      0.146     12.994      0.000
##     Y4                 0.817      0.025     32.796      0.000
##     Y5                 0.566      0.038     14.850      0.000
##     Y6                 0.702      0.023     30.142      0.000
## 
##  Item Difficulties
##     Y1$1               0.982      0.028     34.651      0.000
##     Y2$1               2.409      0.077     31.223      0.000
##     Y3$1               1.539      0.031     49.075      0.000
##     Y4$1               0.778      0.023     34.301      0.000
##     Y5$1               3.619      0.193     18.782      0.000
##     Y6$1               0.460      0.022     20.958      0.000
## 
##  Variances
##     COG                1.000      0.000      0.000      1.000
## 
## 
##      Beginning Time:  13:29:28
##         Ending Time:  13:29:29
##        Elapsed Time:  00:00:01
## 
## 
## 
## MUTHEN & MUTHEN
## 3463 Stoner Ave.
## Los Angeles, CA  90066
## 
## Tel: (310) 391-9971
## Fax: (310) 391-8971
## Web: www.StatModel.com
## Support: Support@StatModel.com
## 
## Copyright (c) 1998-2021 Muthen & Muthen

Now I use the MplusAutomation::readModels to get useful information from the output back to R

Results_WLSMV_theta <- MplusAutomation::readModels(target="formplus.out")
## Reading model:  formplus.out
summary(Results_WLSMV_theta)
## CFA model of 6 SPSMQ items in EPESEEstimated using WLSMV 
## Number of obs: 13519, number of (free) parameters: 12 
## 
## Model: Chi2(df = 9) = 337.5, p = 0 
## Baseline model: Chi2(df = 15) = 7276.59, p = 0 
## 
## Fit Indices: 
## 
## CFI = 0.955, TLI = 0.925, SRMR = 0.052 
## RMSEA = 0.052, 90% CI [0.047, 0.057], p < .05 = 0.24 
## AIC = NA, BIC = NA

Extract separate objects for model results and IRT parameterization. Use PT for Probit/Theta and PL2 for two parameter logistic:

est_PT <- print(Results_WLSMV_theta$parameters$unstandardized)
##    paramHeader param   est    se  est_se pval
## 1       COG.BY    Y1 0.731 0.023  31.197    0
## 2       COG.BY    Y2 0.898 0.047  19.076    0
## 3       COG.BY    Y3 1.901 0.146  12.994    0
## 4       COG.BY    Y4 0.817 0.025  32.796    0
## 5       COG.BY    Y5 0.566 0.038  14.850    0
## 6       COG.BY    Y6 0.702 0.023  30.142    0
## 7   Thresholds  Y1$1 0.718 0.016  44.113    0
## 8   Thresholds  Y2$1 2.163 0.054  40.163    0
## 9   Thresholds  Y3$1 2.925 0.178  16.386    0
## 10  Thresholds  Y4$1 0.636 0.017  38.311    0
## 11  Thresholds  Y5$1 2.049 0.039  52.734    0
## 12  Thresholds  Y6$1 0.323 0.014  22.739    0
## 13   Variances   COG 1.000 0.000 999.000  999
est_2PL <- print(Results_WLSMV_theta$parameters$irt.parameterization)
##          paramHeader param   est    se est_se pval
## 1             COG.BY    Y1 0.731 0.023 31.197    0
## 2             COG.BY    Y2 0.898 0.047 19.076    0
## 3             COG.BY    Y3 1.901 0.146 12.994    0
## 4             COG.BY    Y4 0.817 0.025 32.796    0
## 5             COG.BY    Y5 0.566 0.038 14.850    0
## 6             COG.BY    Y6 0.702 0.023 30.142    0
## 7  Item.Difficulties  Y1$1 0.982 0.028 34.651    0
## 8  Item.Difficulties  Y2$1 2.409 0.077 31.223    0
## 9  Item.Difficulties  Y3$1 1.539 0.031 49.075    0
## 10 Item.Difficulties  Y4$1 0.778 0.023 34.301    0
## 11 Item.Difficulties  Y5$1 3.619 0.193 18.782    0
## 12 Item.Difficulties  Y6$1 0.460 0.022 20.958    0
## 13         Variances   COG 1.000 0.000  0.000    1

Item response function, 2PL

The two-parameter logistic (2PL) form of the item response function is:

\[P(y_j=1|\eta_i) = F(Da_j(\eta_i-b_j))\]

where \(y_{ij}\) is an item response to binary test item \(j\) by person \(i\), (\(y\) is observed as 0 or 1 – I use the phrase indicated response to refer to 1’s on y, this is more generic than correct, incorrect, symptom present, etc), \(\eta_i\) is a person \(i\)’s level of a latent trait or latent ability, \(a_j\) is the discrimination parameter for item \(j\), \(b_j\) is the difficulty parameter for item \(j\), \(D\) is a scaling constant, usually taken to be 1.7, and \(F\) is a link function. In the 2PL framework, it is a logistic link function:

\[F(Da_j(\eta_j-b_j)) = \frac{e^{Da_j(\eta_i-b_j)}}{1+e^{Da_j(\eta_i-b_j)}} = \frac{1}{1+e^{-Da_j(\eta_i-b_j)}}\]

What is neat about the 2PL formulation \(a(\eta-b)\) is that it is clear that the probability of an indicated response is dependent upon the relationship between a person’s ability (\(\eta\)) and an item’s difficulty (\(b\)) if the person’s ability is greater than the item’s difficulty, \(\eta-b\) will be positive (assuming \(a\) is also positive). When \(\eta-b\) is positive (and \(a\) is positive), the probability of an indicated response will be greater than 50%. This is because if \(F\) is the logistic function, \(F(0)\) is to 0.50.

# Define some functions: logit(p) and invlogit(z)
logit <- function(p) {
  log(p/(1-p))
}
invlogit <- function(z) {
  exp(z)/(1 + exp(z))  
}
invlogit(0)
## [1] 0.5

In this way, person abilities and item difficulties are expressed on the same scale. This is useful for identifying good test items to ask people, items that have difficulty values that are in the region of ability where the person is located.

Now, we did not ask Mplus to give us a model in the 2PL framework. We asked for what could be called 2PN or two parameter normal probability, or 2PP for two parameter probit regression. But, Mplus provided the 2PL scaling of results in the IRT Parameterization part of the output. This is easy to do because there are simple linear transformations of the 2PN results that place them on the 2PL scale. Mplus only provides this reparameterization of output when all items are binary, there is a single common latent variable (cog in this example), and there are no covariates included in the model.

Item Characteristic Curve, 2PL

To draw an item characteristic curve (ICC) using item parameters expressed on the 2PL metric, we can use the following:

curve(invlogit(D*a*(x-b)), -4, 4, n=1001,  
      xlab = "cog", ylab = "P(y=1)", col="black")

where for D we will use 1.7, a we will use the COG.BY value from the IRT Parameterization part of the Mplus output, and for b we will use the Item.Difficulties value from the IRT Parameterization part of the Mplus output. x remains as x because the curve function will use that as x-axis values in the plot. We will use the values for item 6.

a <- est_2PL$est[6]
b <- est_2PL$est[12]
D <- 1.7
curve(invlogit(D*a*(x-b)), -4, 4, n=1001,  
      xlab = "cog", ylab = "P(y=1)", col="black", lwd = 2)

Item response function, 2 parameter normal, theta parameterization

The two-parameter normal (2PN) form of the item response function is:

\[P(y_j=1|\eta_i) = \Phi^{-1}(-\tau_{1j} + \lambda_j\eta_i)\]

where all \(y_{ij}\) is an item response to binary test item \(j\) by person \(i\), (\(y\) is observed as 0 or 1), \(\eta_i\) is a person \(i\)’s level of a latent trait or latent ability, \(\lambda_j\) is the measurement slope for item \(j\), \(\tau_{1j}\) is threshold 1 for item \(j\), and \(\Phi^{-1}\) is a link function. In the 2PN framework, it is a normal probability or probit link function.

curve(invlogit(D*a*(x-b)), -4, 4, n=1001,  
      xlab = "cog", ylab = "P(y=1)", col=alpha("black",.5), lwd = 2)
curve(pnorm(-1*est_PT$est[12] + est_PT$est[6]*x)  , -4, 4, n=1001, 
      col="black", add = TRUE)

These curves are very nicely overlapping. Not exactly the same, and this is because the shapes of the normal cumulative distribution function (called with pnorm) and the logistic distribution function are not exactly the same shape.

Mplus CFA model using WLSMV estimator, delta parameterization

\[P(y_j=1|\eta_i) = \Phi^{-1}\left(\frac{-\tau_{1j} + \lambda_j\eta_i}{\sqrt{1-\lambda_j^2}}\right)\]

where all symbols are as described above for the 2PN theta parameterization case.

The reason why we divide \((-\tau + \lambda\eta)\) by \(\sqrt{1-\lambda^2}\) when the model is multivariate probit under the delta parameterization is explained in Mplus Web Note 4. The value \((1-\lambda^2\psi)\) (where \(\psi\) is the common latent variable, \(\eta\), variance or residual variance; I have left it out of the equation above because in our example, \(\psi=1\)) is the residual variance of the latent response variable (the continuous \(y^*\) variable underlying the observed categorical \(y\) variable). Dividing WLSMV delta parameterization estimates in \(\tau\) and \(\lambda\) by the standard deviation of \(y^*\) (i.e., the square root of the variance of \(y^*\)) is necessary to use the standard form of the normal probability CDF (pnorm). In the theta parameterization, the theta’s are parameters (not solved by \(1-\lambda^2\psi\)) and therefore the delta parameters are constrained (to 1 in the single group case as a default). The delta parameters are the inverse variance terms for the latent response variable (\(\Delta = 1/\sqrt{\sigma^*}\)).

To plot this, we’ll have to estimate new Mplus model, were we specifically ask for the (default) delta parameterization:

TITLE:     CFA model of 6 SPSMQ items in EPESE
DATA:      FILE = ex0200.dat ; 
VARIABLE:  NAMES = y1 y2 y3 y4 y5 y6; 
           CATEGORICAL = y1 y2 y3 y4 y5 y6 ;
           MISSING = . ; 
ANALYSIS:  ESTIMATOR = WLSMV ;
           PARAMETERIZATION = DELTA ;
MODEL:     cog by y1-y6*;
           cog @ 1 ;
## Mplus VERSION 8.6 (Mac)
## MUTHEN & MUTHEN
## 07/19/2021   1:29 PM
## 
## INPUT INSTRUCTIONS
## 
##   TITLE:     CFA model of 6 SPSMQ items in EPESE
##   DATA:      FILE = ex0200.dat ;
##   VARIABLE:  NAMES = y1 y2 y3 y4 y5 y6;
##              CATEGORICAL = y1 y2 y3 y4 y5 y6 ;
##              MISSING = . ;
##   ANALYSIS:  ESTIMATOR = WLSMV ;
##              PARAMETERIZATION = DELTA ;
##   MODEL:     cog by y1-y6*;
##              cog @ 1 ;
## 
## 
## 
## *** WARNING
##   Data set contains cases with missing on all variables.
##   These cases were not included in the analysis.
##   Number of cases with missing on all variables:  637
##    1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
## 
## 
## 
## CFA model of 6 SPSMQ items in EPESE
## 
## SUMMARY OF ANALYSIS
## 
## Number of groups                                                 1
## Number of observations                                       13519
## 
## Number of dependent variables                                    6
## Number of independent variables                                  0
## Number of continuous latent variables                            1
## 
## Observed dependent variables
## 
##   Binary and ordered categorical (ordinal)
##    Y1          Y2          Y3          Y4          Y5          Y6
## 
## Continuous latent variables
##    COG
## 
## 
## Estimator                                                    WLSMV
## Maximum number of iterations                                  1000
## Convergence criterion                                    0.500D-04
## Maximum number of steepest descent iterations                   20
## Maximum number of iterations for H1                           2000
## Convergence criterion for H1                             0.100D-03
## Parameterization                                             DELTA
## Link                                                        PROBIT
## 
## Input data file(s)
##   ex0200.dat
## 
## Input data format  FREE
## 
## 
## SUMMARY OF DATA
## 
##      Number of missing data patterns            26
## 
## 
## COVARIANCE COVERAGE OF DATA
## 
## Minimum covariance coverage value   0.100
## 
## 
##      PROPORTION OF DATA PRESENT
## 
## 
##            Covariance Coverage
##               Y1            Y2            Y3            Y4            Y5
##               ________      ________      ________      ________      ________
##  Y1             0.998
##  Y2             0.997         0.998
##  Y3             0.996         0.995         0.997
##  Y4             0.994         0.994         0.994         0.995
##  Y5             0.992         0.992         0.991         0.990         0.994
##  Y6             0.945         0.945         0.945         0.945         0.942
## 
## 
##            Covariance Coverage
##               Y6
##               ________
##  Y6             0.946
## 
## 
## UNIVARIATE PROPORTIONS AND COUNTS FOR CATEGORICAL VARIABLES
## 
##     Y1
##       Category 1    0.719         9702.000
##       Category 2    0.281         3793.000
##     Y2
##       Category 1    0.946        12763.000
##       Category 2    0.054          725.000
##     Y3
##       Category 1    0.913        12308.000
##       Category 2    0.087         1168.000
##     Y4
##       Category 1    0.689         9271.000
##       Category 2    0.311         4187.000
##     Y5
##       Category 1    0.963        12937.000
##       Category 2    0.037          501.000
##     Y6
##       Category 1    0.604         7730.000
##       Category 2    0.396         5064.000
## 
## 
## 
## THE MODEL ESTIMATION TERMINATED NORMALLY
## 
## 
## 
## MODEL FIT INFORMATION
## 
## Number of Free Parameters                       12
## 
## Chi-Square Test of Model Fit
## 
##           Value                            337.498*
##           Degrees of Freedom                     9
##           P-Value                           0.0000
## 
## *   The chi-square value for MLM, MLMV, MLR, ULSMV, WLSM and WLSMV cannot be used
##     for chi-square difference testing in the regular way.  MLM, MLR and WLSM
##     chi-square difference testing is described on the Mplus website.  MLMV, WLSMV,
##     and ULSMV difference testing is done using the DIFFTEST option.
## 
## RMSEA (Root Mean Square Error Of Approximation)
## 
##           Estimate                           0.052
##           90 Percent C.I.                    0.047  0.057
##           Probability RMSEA <= .05           0.240
## 
## CFI/TLI
## 
##           CFI                                0.955
##           TLI                                0.925
## 
## Chi-Square Test of Model Fit for the Baseline Model
## 
##           Value                           7276.590
##           Degrees of Freedom                    15
##           P-Value                           0.0000
## 
## SRMR (Standardized Root Mean Square Residual)
## 
##           Value                              0.052
## 
## Optimum Function Value for Weighted Least-Squares Estimator
## 
##           Value                     0.84220376D-02
## 
## 
## 
## MODEL RESULTS
## 
##                                                     Two-Tailed
##                     Estimate       S.E.  Est./S.E.    P-Value
## 
##  COG      BY
##     Y1                 0.590      0.012     47.858      0.000
##     Y2                 0.668      0.019     34.459      0.000
##     Y3                 0.885      0.015     59.947      0.000
##     Y4                 0.633      0.012     54.722      0.000
##     Y5                 0.493      0.025     19.611      0.000
##     Y6                 0.574      0.013     44.972      0.000
## 
##  Thresholds
##     Y1$1               0.580      0.011     50.520      0.000
##     Y2$1               1.610      0.018     90.542      0.000
##     Y3$1               1.362      0.015     88.700      0.000
##     Y4$1               0.493      0.011     43.624      0.000
##     Y5$1               1.783      0.020     88.782      0.000
##     Y6$1               0.264      0.011     23.543      0.000
## 
##  Variances
##     COG                1.000      0.000    999.000    999.000
## 
## 
## QUALITY OF NUMERICAL RESULTS
## 
##      Condition Number for the Information Matrix              0.165E+00
##        (ratio of smallest to largest eigenvalue)
## 
## 
## IRT PARAMETERIZATION
## 
##                                                     Two-Tailed
##                     Estimate       S.E.  Est./S.E.    P-Value
## 
##  Item Discriminations
## 
##  COG      BY
##     Y1                 0.731      0.023     31.198      0.000
##     Y2                 0.898      0.047     19.075      0.000
##     Y3                 1.901      0.146     12.992      0.000
##     Y4                 0.818      0.025     32.792      0.000
##     Y5                 0.566      0.038     14.850      0.000
##     Y6                 0.701      0.023     30.143      0.000
## 
##  Item Difficulties
##     Y1$1               0.982      0.028     34.648      0.000
##     Y2$1               2.409      0.077     31.223      0.000
##     Y3$1               1.538      0.031     49.074      0.000
##     Y4$1               0.778      0.023     34.307      0.000
##     Y5$1               3.619      0.193     18.783      0.000
##     Y6$1               0.460      0.022     20.956      0.000
## 
##  Variances
##     COG                1.000      0.000      0.000      1.000
## 
## 
## R-SQUARE
## 
##     Observed                   Residual
##     Variable        Estimate   Variance
## 
##     Y1                 0.348      0.652
##     Y2                 0.446      0.554
##     Y3                 0.783      0.217
##     Y4                 0.401      0.599
##     Y5                 0.243      0.757
##     Y6                 0.330      0.670
## 
## 
##      Beginning Time:  13:29:29
##         Ending Time:  13:29:29
##        Elapsed Time:  00:00:00
## 
## 
## 
## MUTHEN & MUTHEN
## 3463 Stoner Ave.
## Los Angeles, CA  90066
## 
## Tel: (310) 391-9971
## Fax: (310) 391-8971
## Web: www.StatModel.com
## Support: Support@StatModel.com
## 
## Copyright (c) 1998-2021 Muthen & Muthen
Results_WLSMV_delta <- MplusAutomation::readModels(target="formplus.out")
## Reading model:  formplus.out
summary(Results_WLSMV_delta)
## CFA model of 6 SPSMQ items in EPESEEstimated using WLSMV 
## Number of obs: 13519, number of (free) parameters: 12 
## 
## Model: Chi2(df = 9) = 337.498, p = 0 
## Baseline model: Chi2(df = 15) = 7276.59, p = 0 
## 
## Fit Indices: 
## 
## CFI = 0.955, TLI = 0.925, SRMR = 0.052 
## RMSEA = 0.052, 90% CI [0.047, 0.057], p < .05 = 0.24 
## AIC = NA, BIC = NA
est_PD <- print(Results_WLSMV_delta$parameters$unstandardized)
##    paramHeader param   est    se  est_se pval
## 1       COG.BY    Y1 0.590 0.012  47.858    0
## 2       COG.BY    Y2 0.668 0.019  34.459    0
## 3       COG.BY    Y3 0.885 0.015  59.947    0
## 4       COG.BY    Y4 0.633 0.012  54.722    0
## 5       COG.BY    Y5 0.493 0.025  19.611    0
## 6       COG.BY    Y6 0.574 0.013  44.972    0
## 7   Thresholds  Y1$1 0.580 0.011  50.520    0
## 8   Thresholds  Y2$1 1.610 0.018  90.542    0
## 9   Thresholds  Y3$1 1.362 0.015  88.700    0
## 10  Thresholds  Y4$1 0.493 0.011  43.624    0
## 11  Thresholds  Y5$1 1.783 0.020  88.782    0
## 12  Thresholds  Y6$1 0.264 0.011  23.543    0
## 13   Variances   COG 1.000 0.000 999.000  999
j <- 6
j.t <- j+6
curve(invlogit(D*est_2PL$est[j]*(x-est_2PL$est[j.t])), -4, 4, n=1001,  
      xlab = "cog", ylab = "P(y=1)", col=alpha("black",.75), lwd = 3)
curve(pnorm(-1*est_PT$est[j.t] + est_PT$est[j]*x)  , -4, 4, n=1001, 
      col=alpha("violet",.5), lwd = 4, add = TRUE)
curve(pnorm(
     (-1*est_PD$est[j.t] + est_PD$est[j]*x)/((1-est_PD$est[j]^2)^.5)),
     -4, 4, n=1001, 
      col="black", add = TRUE)

All three curves are almost exactly overlapping.

Five parameterizations

To round out the parameterizations, we have to add the full information maximum likelihood logit and probit regression models, and the Bayesian model

Mplus CFA model using Bayes estimator (probit model)

TITLE:     CFA model of 6 SPSMQ items in EPESE
DATA:      FILE = ex0200.dat ; 
VARIABLE:  NAMES = y1 y2 y3 y4 y5 y6; 
           CATEGORICAL = y1 y2 y3 y4 y5 y6 ;
           MISSING = . ; 
ANALYSIS:  ESTIMATOR = BAYES ;
MODEL:     cog by y1-y6*;
           cog @ 1 ;
## Mplus VERSION 8.6 (Mac)
## MUTHEN & MUTHEN
## 07/19/2021   1:29 PM
## 
## INPUT INSTRUCTIONS
## 
##   TITLE:     CFA model of 6 SPSMQ items in EPESE
##   DATA:      FILE = ex0200.dat ;
##   VARIABLE:  NAMES = y1 y2 y3 y4 y5 y6;
##              CATEGORICAL = y1 y2 y3 y4 y5 y6 ;
##              MISSING = . ;
##   ANALYSIS:  ESTIMATOR = BAYES ;
##   MODEL:     cog by y1-y6*;
##              cog @ 1 ;
## 
## 
## 
## *** WARNING
##   Data set contains cases with missing on all variables.
##   These cases were not included in the analysis.
##   Number of cases with missing on all variables:  637
##    1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
## 
## 
## 
## CFA model of 6 SPSMQ items in EPESE
## 
## SUMMARY OF ANALYSIS
## 
## Number of groups                                                 1
## Number of observations                                       13519
## 
## Number of dependent variables                                    6
## Number of independent variables                                  0
## Number of continuous latent variables                            1
## 
## Observed dependent variables
## 
##   Binary and ordered categorical (ordinal)
##    Y1          Y2          Y3          Y4          Y5          Y6
## 
## Continuous latent variables
##    COG
## 
## 
## Estimator                                                    BAYES
## Specifications for Bayesian Estimation
##   Point estimate                                            MEDIAN
##   Number of Markov chain Monte Carlo (MCMC) chains               2
##   Random seed for the first chain                                0
##   Starting value information                           UNPERTURBED
##   Algorithm used for Markov chain Monte Carlo           GIBBS(PX1)
##   Convergence criterion                                  0.500D-01
##   Maximum number of iterations                               50000
##   K-th iteration used for thinning                               1
## Link                                                        PROBIT
## 
## Input data file(s)
##   ex0200.dat
## Input data format  FREE
## 
## 
## SUMMARY OF DATA
## 
## 
## 
## COVARIANCE COVERAGE OF DATA
## 
## Minimum covariance coverage value   0.100
## 
##      Number of missing data patterns            26
## 
## 
##      PROPORTION OF DATA PRESENT
## 
## 
##            Covariance Coverage
##               Y1            Y2            Y3            Y4            Y5
##               ________      ________      ________      ________      ________
##  Y1             0.998
##  Y2             0.997         0.998
##  Y3             0.996         0.995         0.997
##  Y4             0.994         0.994         0.994         0.995
##  Y5             0.992         0.992         0.991         0.990         0.994
##  Y6             0.945         0.945         0.945         0.945         0.942
## 
## 
##            Covariance Coverage
##               Y6
##               ________
##  Y6             0.946
## 
## 
## UNIVARIATE PROPORTIONS AND COUNTS FOR CATEGORICAL VARIABLES
## 
##     Y1
##       Category 1    0.719         9702.000
##       Category 2    0.281         3793.000
##     Y2
##       Category 1    0.946        12763.000
##       Category 2    0.054          725.000
##     Y3
##       Category 1    0.913        12308.000
##       Category 2    0.087         1168.000
##     Y4
##       Category 1    0.689         9271.000
##       Category 2    0.311         4187.000
##     Y5
##       Category 1    0.963        12937.000
##       Category 2    0.037          501.000
##     Y6
##       Category 1    0.604         7730.000
##       Category 2    0.396         5064.000
## 
## 
## 
## THE MODEL ESTIMATION TERMINATED NORMALLY
## 
##      USE THE FBITERATIONS OPTION TO INCREASE THE NUMBER OF ITERATIONS BY A FACTOR
##      OF AT LEAST TWO TO CHECK CONVERGENCE AND THAT THE PSR VALUE DOES NOT INCREASE.
## 
## 
## 
## MODEL FIT INFORMATION
## 
## Number of Free Parameters                              12
## 
## Bayesian Posterior Predictive Checking using Chi-Square
## 
##           95% Confidence Interval for the Difference Between
##           the Observed and the Replicated Chi-Square Values
## 
##                                  -1.864            56.020
## 
##           Posterior Predictive P-Value              0.038
## 
## 
## 
## MODEL RESULTS
## 
##                                 Posterior  One-Tailed         95% C.I.
##                     Estimate       S.D.      P-Value   Lower 2.5%  Upper 2.5%  Significance
## 
##  COG      BY
##     Y1                 0.688       0.022      0.000       0.648       0.735      *
##     Y2                 0.802       0.034      0.000       0.751       0.899      *
##     Y3                 1.545       0.046      0.000       1.435       1.611      *
##     Y4                 0.815       0.035      0.000       0.755       0.895      *
##     Y5                 0.538       0.033      0.000       0.471       0.598      *
##     Y6                 0.721       0.021      0.000       0.682       0.764      *
## 
##  Thresholds
##     Y1$1               0.701       0.016      0.000       0.667       0.731      *
##     Y2$1               2.065       0.038      0.000       2.007       2.168      *
##     Y3$1               2.498       0.053      0.000       2.370       2.588      *
##     Y4$1               0.633       0.019      0.000       0.599       0.672      *
##     Y5$1               2.021       0.037      0.000       1.948       2.097      *
##     Y6$1               0.311       0.013      0.000       0.285       0.336      *
## 
##  Variances
##     COG                1.000       0.000      0.000       1.000       1.000
## 
## 
##      Beginning Time:  13:29:30
##         Ending Time:  13:29:39
##        Elapsed Time:  00:00:09
## 
## 
## 
## MUTHEN & MUTHEN
## 3463 Stoner Ave.
## Los Angeles, CA  90066
## 
## Tel: (310) 391-9971
## Fax: (310) 391-8971
## Web: www.StatModel.com
## Support: Support@StatModel.com
## 
## Copyright (c) 1998-2021 Muthen & Muthen
Results_Bayes <- MplusAutomation::readModels(target="formplus.out")
## Reading model:  formplus.out
summary(Results_Bayes)
## CFA model of 6 SPSMQ items in EPESEEstimated using BAYES 
## Number of obs: 13519, number of (free) parameters: 12 
## 
## Fit Indices: 
## 
## CFI = NA, TLI = NA, SRMR = NA 
## RMSEA = NA, 90% CI [NA, NA], p < .05 = NA 
## AIC = NA, BIC = NA
est_Bayes <- print(Results_Bayes$parameters$unstandardized)
##    paramHeader param   est posterior_sd pval lower_2.5ci upper_2.5ci   sig
## 1       COG.BY    Y1 0.688        0.022    0       0.648       0.735  TRUE
## 2       COG.BY    Y2 0.802        0.034    0       0.751       0.899  TRUE
## 3       COG.BY    Y3 1.545        0.046    0       1.435       1.611  TRUE
## 4       COG.BY    Y4 0.815        0.035    0       0.755       0.895  TRUE
## 5       COG.BY    Y5 0.538        0.033    0       0.471       0.598  TRUE
## 6       COG.BY    Y6 0.721        0.021    0       0.682       0.764  TRUE
## 7   Thresholds  Y1$1 0.701        0.016    0       0.667       0.731  TRUE
## 8   Thresholds  Y2$1 2.065        0.038    0       2.007       2.168  TRUE
## 9   Thresholds  Y3$1 2.498        0.053    0       2.370       2.588  TRUE
## 10  Thresholds  Y4$1 0.633        0.019    0       0.599       0.672  TRUE
## 11  Thresholds  Y5$1 2.021        0.037    0       1.948       2.097  TRUE
## 12  Thresholds  Y6$1 0.311        0.013    0       0.285       0.336  TRUE
## 13   Variances   COG 1.000        0.000    0       1.000       1.000 FALSE

Comparing all estimator sets

Parameter IRT WLSMV:theta MLR:probit Bayes MLR:logit WLSMV:delta
COG.BY:Y1 0.731 0.731 0.693 0.688 1.176 0.59
COG.BY:Y2 0.898 0.898 0.794 0.802 1.567 0.668
COG.BY:Y3 1.901 1.901 1.662 1.545 2.97 0.885
COG.BY:Y4 0.817 0.817 0.81 0.815 1.372 0.633
COG.BY:Y5 0.566 0.566 0.531 0.538 1.139 0.493
COG.BY:Y6 0.702 0.702 0.719 0.721 1.22 0.574
difficulty or threshold:Y1$1 0.982 0.718 0.702 0.701 1.185 0.58
difficulty or threshold:Y2$1 2.409 2.163 2.056 2.065 3.833 1.61
difficulty or threshold:Y3$1 1.539 2.925 2.637 2.498 4.714 1.362
difficulty or threshold:Y4$1 0.778 0.636 0.629 0.633 1.068 0.493
difficulty or threshold:Y5$1 3.619 2.049 2.017 2.021 3.816 1.783
difficulty or threshold:Y6$1 0.46 0.323 0.31 0.311 0.522 0.264

Note the IRT parameter estimates are those reported by Mplus and estimated from the WLSMV estimator, theta parameterization model.

Here is how they are related:

Parameter IRT WLSMV:theta MLR:probit Bayes MLR:logit WLSMV:delta
discrimination \(a\) \(\lambda\) \(\lambda\) \(\lambda\) \(\lambda/D\) \(\lambda/\sqrt{1-\lambda^2}\)
difficulty \(b\) \(\tau/\lambda\) \(\tau/\lambda\) \(\tau/\lambda\) \(\tau/\lambda\) \(\tau/\lambda\)
factor loading (\(COR(y^*,\eta)\)) \(a/\sqrt{1+a^2}\) \(\lambda/\sqrt{1+\lambda^2}\) \(\lambda/\sqrt{1+\lambda^2}\) \(\lambda/\sqrt{1+\lambda^2}\) \(\lambda D^{-1}/\sqrt{1+(\lambda D^{-1})^2}\) \(\lambda\)

Note: \(D\) is often taken to be 1.7. Sometimes it is taken to be 1.814, which is \(\sqrt{\pi^3/3}\). Also note that if \(\lambda\) and \(D\) are positive, the MLR:logit parameters can be taken to find the correlation of the latent trait and the latent response variable with: \(\lambda/\sqrt{\lambda^2 + D^2}\).

Here is a plot with all five estimator, parameterization, and link function options plotted one atop each other:

j <- 6
j.t <- j+6
# 2PL
curve(invlogit(D*est_2PL$est[j]*(x-est_2PL$est[j.t])), -4, 4, n=1001,  
      xlab = "cog", ylab = "P(y=1)", col=alpha("black",.75), lwd = 1)
# WMSMV (probit) theta parameterization
curve(pnorm(-1*est_PT$est[j.t] + est_PT$est[j]*x)  , 
      -4, 4, n=1001, col=alpha("violet",.5), lwd = 1, add = TRUE)
# WLSMV (probit) delta parameterization
curve(pnorm((-1*est_PD$est[j.t] + est_PD$est[j]*x)/((1-est_PD$est[j]^2)^.5)),
     -4, 4, n=1001, col="black", add = TRUE)
# BAYES estimator (same as WLSMV(theta))
curve(pnorm(-1*est_Bayes$est[j.t] + est_Bayes$est[j]*x)  , 
      -4, 4, n=1001, col=alpha("green",.5), lwd = 1, add = TRUE)
# MLR logit
curve(invlogit(-1*est_ML$est[j.t] + est_ML$est[j]*x)  , 
      -4, 4, n=1001, col=alpha("red",.5), lwd = 1, add = TRUE)
# MLR probit
curve(pnorm(-1*est_MP$est[j.t] + est_MP$est[j]*x)  , 
      -4, 4, n=1001, col=alpha("yellow",.5), lwd = 1, add = TRUE)