Estimating factor scores in Mplus (and MIRT): Context matters

Contributors

April 15 2021

Contributors include Rich Jones rich_jones@brown.edu, Doug Tommet, Phoebe Scollard, Laura Gibbons, Joey Mukherjee

Objective

The objective of this note is to show how different ways of handling the mean and variance of a latent variable influence the estimated factor scores under maximum likelihood categorical item factor analysis models using Mplus. Appendix 2 provides R code that shows the same principles apply when using R/MIRT to estimate factor scores.

If the factor loadings and item thresholds are fixed, one might think that factor scores will be constant and sample independent. By sample independent, I mean that the same factor scores would be estimated for a given response pattern regardless of the sample in which that person’s response pattern was placed.

I will show that in fact Mplus factor scores are sample dependent unless the means and variances of the latent trait are also fixed to the presumed population values. Additionally, the user will want to ensure that when generating factor scores with Mplus that the ANALYSIS: options do not include integration=montecarlo;, and should include integration=40; and adaptive=off; (Thanks to Joey Mukherjee: see note in the Postscript).

The reason why is both Mplus and MIRT use a Bayesian approach to factor score estimation, called expected a posteriori and abbreviated EAP. This approach uses a prior on the mean and variance of the underlying latent trait, which is the model-estimated (or model constrained) mean and variance of the latent trait.

Approach

I use the Stata automobile data set (dta) for this demonstration. This is a data set with measurements on 74 cars. I use this to generate a simple three indicator confirmatory factor analysis model for categorical dependent variables using robust maximum likelihood parameter estimation, equivalent to a two parameter logistic (2PL) item response theory (IRT) model.

Then – holding the thresholds and measurement slopes (factor loadings) fixed to their estimated values – I generate factor score estimates in three conditions:

  1. Latent factor mean and variance are fixed to 0 and 1, respectively.
  2. Latent factor mean is freely estimated but variance is fixed at 1.
  3. Latent factor mean and variance are both freely estimated.

I refer to these three conditions as mm for Mplus [scoring] machine.

I generate jackknife factor scores, meaning I generate factor scores N − 1 times for each mm condition, each of those times removing 1 of the observations from the data set. Except for the 1978 Plymouth Volare (line 46 in the data file). I leave the Volare in each time, and then we’ll look and see how variable the factor score estimate is for this car across the N − 1 replications.

Plymouth Volare (from wikipedia)

Analyses are performed within Stata and make use of the runmplus module.

Example data

I’m going to use the auto data set from Stata. If you’re not a Stata user it can be obtained at http://www.stata-press.com/data/r13/auto.dta.

This data set has information on 74 cars from the 1978 model year. I am going to generate a factor model for car size using three dichotomized indicators based on measured headroom, trunk size, and weight.

The Mplus data file I create in the steps below, and use in the models can be downloaded from here.

Process data

. sysuse auto , clear
(1978 Automobile Data)

. set seed 1002

. gen carid=_n

. rename headroom hdroom

. foreach x in hdroom trunk weight  {
  2. * dichotomize at a random location
. local pis : di %2.0f 100*(1/(1+exp(-1*(invnorm(uniform()))/((1)/2.065*c(pi)))))
  3. _pctile `x' , p(`pis')
  4. gen u_`x'=`x'>`r(r1)'
  5. }

. su u*

    Variable │        Obs        Mean    Std. Dev.       Min        Max
─────────────┼─────────────────────────────────────────────────────────
    u_hdroom │         74    .2027027     .404757          0          1
     u_trunk │         74    .4864865    .5032291          0          1
    u_weight │         74    .3378378    .4762015          0          1

. tetrac u*
(obs=74)

Approximate Tetrachoric Correlations

          u_hdroom   u_trunk  u_weight
u_hdroom    1.0000
 u_trunk    0.8456    1.0000
u_weight    0.6011    0.6050    1.0000

. 
. /*
> * to support claims in the post-script
> * I added this code
> replace u_hdroom=0 in 46
> replace u_trunk=0 in 46
> replace u_weight=0 in 46
> */
. 
. tempfile f1

. save `f1'
file /var/folders/lq/w3m6z0dj41ngkbbc0204xb7m0000gp/T//S_02940.000001 saved

Estimate population measurement model

The Mplus input statement is presented below (and can be downloaded here). Note that one of the options I call is the Mplus OUTPUT: SVALUES option. This produces a nice list of all the estimated parameters and their final estimates. The name of this output option implies that it is intended that these would be used as starting values for a subsequent model, but I will modify them to use as fixed values in subsequent models. I also save the factor score estimates (SAVEDATA: SAVE = FSCORES) and we’ll look at the factor score estimate for the Volare at the end of this note.

TITLE: 
  Variable List - 

  u_hdroom : 
  u_trunk : 
  u_weight : 
  carid : 
 
DATA: 
  FILE = carsize.dat ;
VARIABLE:
  NAMES = 
    u_hdroom u_trunk u_weight carid ; 
  MISSING ARE ALL (-9999) ; 
  CATEGORICAL = 
    all
    ;
  IDVARIABLE = carid ;
ANALYSIS: 
   ESTIMATOR = mlr ;
OUTPUT: 
svalues  ; 
 
SAVEDATA: 
save=fscores ; 
file=popfscores.dat ; 
 
MODEL: 
carsize by u_hdroom*.7 ; 
carsize by u_trunk*.7  ; 
carsize by u_weight*.7  ; 
carsize@1 ; 

Here is a look at the final parameter estimates as they are reported in the SVALUES part of the output:

    carsize BY u_hdroom*2.91769;
    carsize BY u_trunk*4.38769;
    carsize BY u_weight*1.44837;
    [u_hdroom$1*2.83303];
    [u_trunk$1*0.14185];
    [u_weight$1*0.92902];
    carsize@1;

Next I will change that model statement to make all parameters fixed (replacing * with @) and then make some further changes to code the three different Mplus scoring machines.

Let’s look at each of the models:

mm1

    carsize BY u_hdroom@2.91769;
    carsize BY u_trunk@4.38769;
    carsize BY u_weight@1.44837;
    [u_hdroom$1@2.83303];
    [u_trunk$1@0.14185];
    [u_weight$1@0.92902];
    carsize@1;
    [carsize@0];

mm2

    carsize BY u_hdroom@2.91769;
    carsize BY u_trunk@4.38769;
    carsize BY u_weight@1.44837;
    [u_hdroom$1@2.83303];
    [u_trunk$1@0.14185];
    [u_weight$1@0.92902];
    carsize@1;
    [carsize*0];

mm3

    carsize BY u_hdroom@2.91769;
    carsize BY u_trunk@4.38769;
    carsize BY u_weight@1.44837;
    [u_hdroom$1@2.83303];
    [u_trunk$1@0.14185];
    [u_weight$1@0.92902];
    carsize*1;
    [carsize*0];

Now I’m going to run the jackknife procedure for the Volare’s estimate of car size. I will store the results in a matrix.

. * First I'll see if the jackknife models have been run
. * already. If they have, I won't repeat them.
. * If I wanted to repeat them, I would have to manually delete
. * the `results.dta` file.
. cap confirm file results.dta

. if _rc~=0 {
.   * store results in a matrix
.   matrix results=J(219,3,.)
.   matrix colnames results = replicate mm carsize
.   * call up the example data
.   use `f1' , clear
.   local Ncars=`c(N)'
.   local rowcounter=0
.   * loop over the three machines
.   forvalues mm=1/3 {
  2. * loop over each of the cars in the data set
. forvalues i=1/`Ncars' {
  3.   * except for the Volare
.   if `i'~=46 {
  4.     quietly {
  5.       * call up the analysis file
.       use `f1' , clear
  6.       * drop the i-th car
.       drop if carid==`i'
  7.       * generate factor score estimates
.       runmplus u* , idvariable(carid) cat(all) est(mlr) ///
>         model(`mm`mm'') ///
>         savedata(save=fscores; file=fscores.dat;) ///
>         savelogfile(fscores) log(off)
  8.       clear
  9.       * retreive factor score estimates
.       runmplus_load_savedata , out(fscores.out)
 10.       * keep the estimate only for the Volare
.       keep if carid==46
 11.       keep carsize carid
 12.       * save the results and condition information to matrix
.       local carsize_is=carsize
 13.       mat results[`++rowcounter',1]=`i'
 14.       mat results[`rowcounter',2]=`mm'
 15.       mat results[`rowcounter',3]=`carsize_is'
 16.       noisily di "." _c
 17.     }
 18.   }
 19. }
 20.   }
.   clear
.   matsave results , replace
. }

. use results.dta

. la def mm ///
>    1 "1: [f@0]; f@1;" ///
>    2 "2: [f*0]; f@1;" ///
>    3 "3: [f*0]; f*1;" , modify

. label values mm mm

Jackknife distribution of car size estimates for the Volare

Here are the jackknife distributions of car size estimates for the Volare under different Mplus scoring machines:

. table mm , c(mean carsize sd carsize n carsize) format(%6.4f)

───────────────┬────────────────────────────────────────────
            mm │ mean(carsize)    sd(carsize)     N(carsize)
───────────────┼────────────────────────────────────────────
1: [f@0]; f@1; │        0.9560         0.0000             73
2: [f*0]; f@1; │        0.9572         0.0022             73
3: [f*0]; f*1; │        0.9567         0.0129             73
───────────────┴────────────────────────────────────────────

As can be seen the variances of the carsize estimate (the factor score) are non-zero for the conditions where either the mean or both the mean and the variance of the latent trait are freely estimated.

How does this compare to the estimate of the Volare’s car size in the original model?

. clear

. runmplus_load_savedata , out(carsize.out)

. keep if carid==46
(73 observations deleted)

. list carsize

     ┌─────────┐
     │ carsize │
     ├─────────┤
  1. │    .957 │
     └─────────┘

Conclusions

The only way to get a factor score estimate using Mplus that is sample independent is to fix every single parameter.

What is often unexpected from Mplus users is that the mean and variance of the latent trait must be specified and fixed when using Mplus as a factor score generating machine. And, the values to which they should be fixed to are those that reflect the population estimates corresponding to the model that generated the item parameters.

For example, in our initial model, we estimated factor loadings for every item, identified the model by fixing the variance of the latent trait at 1, and the Mplus default is to assume latent variable means are 0 in single group models. Thus the item parameters we obtain are calibrated assuming population values for the mean and variance of the latent trait are 0 and 1, respectively. Not fixing to these values will produce different factor score estimates when using Mplus as a factor score generating machine.

Extensions

(I think that) theoretically, or ideally, one would have a scoring machine that would generate factor scores with data provided for a single subject.

One can’t do this with Mplus, however, because if you gave Mplus a data set with categorial dependent variables and only one line of data, all of the categorical dependent variables would have only one observed value. This will trigger an error in Mplus, which will complain that categorical dependent variables have no variance. No analysis will be done, and no factor scores will be generated.

But, we can give Mplus data with one subject (observation) and a second line of data, dummy data, that fills out the missing categories:

. * the data we have for the Volare
. list u*

     ┌───────────────────────────────┐
     │ u_hdroom   u_trunk   u_weight │
     ├───────────────────────────────┤
  1. │        1         1          0 │
     └───────────────────────────────┘

. * add a complement row of data
. set obs 2
number of observations (_N) was 1, now 2

. foreach var in u_hdroom u_trunk u_weight {
  2. replace `var'=1-`var'[_n-1] in 2
  3. }
(1 real change made)
(1 real change made)
(1 real change made)

. replace carid=999 in 2
variable carid was byte now int
(1 real change made)

. * now look at the data
. list carid u*

     ┌───────────────────────────────────────┐
     │ carid   u_hdroom   u_trunk   u_weight │
     ├───────────────────────────────────────┤
  1. │    46          1         1          0 │
  2. │   999          0         0          1 │
     └───────────────────────────────────────┘

. keep u* carid

Now generate a factor score with Mplus using mm1. This is the machine with fixed mean and fixed variance. We are giving Mplus a data file with only two lines of data. One of these is the Volare, the other is the complement data line that makes sure all categories are observed.

. quietly runmplus u* , idvariable(carid) cat(all) est(mlr) ///
> model(`mm1') ///
> savedata(save=fscores; file=fscores.dat;) ///
> savelogfile(fscores) log(off)

. clear

. runmplus_load_savedata , out(fscores.out)

. list carid carsize u*

     ┌─────────────────────────────────────────────────┐
     │ carid   carsize   u_hdroom   u_trunk   u_weight │
     ├─────────────────────────────────────────────────┤
  1. │    46      .956          1         1          0 │
  2. │   999     -.332          0         0          1 │
     └─────────────────────────────────────────────────┘

So this is essentially providing the data one subject at a time, with a little trick to get Mplus to actually run.

Postscript: Integration

Joey Mukherjee points out that we also have to take a careful look at the analysis options when scoring using Mplus

From: Shubhabrata Mukherjee

Date: Wed, Apr 14, 2021 at 6:52 PM

Subject: Mplus Factor scores and protocol (Please do not open this mail if you are on a holiday)

To: phosco , Laura Gibbons , Seo-Eun Choi , mkeomec , Paul K. Crane

Cc: Rich Jones (richard_jones@brown.edu)

 Hi All, 

I promised to look into extracting the factor score issue when I am done with the EPAD re-calibration effort. So, here it is. 

Disclaimer: All my results are based on n=1

I used data from NACC  UDS 1/2 Language which are basically 3 items (Category Fluency animals and vegetables, and Boston Naming Test (1+3) total). Reason being they are pretty good items (covers a wide range of ability spectrum and loads pretty high) to start with.  Anyway, I see problems which Phoebe had noted before. 

After talking with several people and reading up a bunch of papers and manuals, I think I have an idea about what’s going on. At least that worked in my data set. 

Here’s what causing/exacerbating this problem in the scoring step: analysis(integration=montecarlo;). We use this option to make our parameter estimating step fast and take less computing resources and most of the Legacy models (or at least the Memory) model wouldn’t have run/converged without that option. 

Why? With integration=montecarlo, random integration points are assigned to each row, so when the data changes different random integration points are assigned to different individual which in turn leads to the problem even with specifying Rich’s approach (mean/variance to 0/1). It’s probably still OK to use at the parameter estimation step (co-calibration) but not at the scoring step. 

How to fix this? Run away from the above option in the scoring step. Instead of the above, useanalysis(integration=30;) option(adaptive=off;)

 The first probably would have worked well alone but I am also usingadaptive=off option and not take any chances. And I just chose a bigger # on integration points from the default. I need to know more about fixing # integration points. 

Here’s a page if you want to know more about those options: https://www.statmodel.com/HTML_UG/chapter16V8.htm  

What happened to my cherry-picked data set when I used the fix? The scores in the main sample and any random subset (I looked at a few subsets where I take 25% of the data each time and used a seed to make those reproducible) were exactly the same when we specify mean/var to 0/1 (basically what Rich proposed). It also made the protocol way (freely estimating mean/var based on data a lot closer to the 0/1 approach). 

The trend I see in multiple other data sets is that a) most of the discrepancies lie on either end of the tail and fixing mean/var to 0/1 pushes those scores a bit more towards the tail compared to freely estimating them in the sample and also b) that those extreme scores have a higher Standard error of Measurement (SEM) for the model where we force the mean/var to 0/1. 

@Phoebe: Framingham has one of the weirdest data structure and a sparse battery. It would be great to see what happens to the Language domain when you make these changes. 

@Seo-Eun and Michael: Feel free to experiment this fix with the NACC UDS battery when you have time. 

I know all of you have a lot on your plate. No rush on this. 

Sorry for the long email, 

-Joey

Postscript: perfect response patterns

I re-ran this toy example, but I set all of u_hdroom, u_trunk, and u_weight to 0 for the Volare. This is a so-called perfect response pattern. These can sometimes cause problems for factor score estimation. But for this data and these examples, all findings hold. Even for the results summarized in the appendix, below.

Postscript: that function

The random cut-point was determined with a function

100*(1/(1+exp(-1*(invnorm(uniform()))/((1)/2.065*c(pi)))))

produces a value from a distribution that is approximately normal (it is slightly platykurtic) and has a mean of about 50 and SD of about 15 and falls within the range 0,100 (I observe between 3 and 97 in 10M simulated values).

Appendix 1

Here’s all the Stata code used in this note:

capture log close
log using "Mplus-EAP", smcl replace
//_1
sysuse auto , clear
set seed 1002
gen carid=_n
rename headroom hdroom
foreach x in hdroom trunk weight  {
* dichotomize at a random location
local pis : di %2.0f 100*(1/(1+exp(-1*(invnorm(uniform()))/((1)/2.065*c(pi)))))
_pctile `x' , p(`pis')
gen u_`x'=`x'>`r(r1)'
}
su u*
tetrac u*

/*
* to support claims in the post-script
* I added this code
replace u_hdroom=0 in 46
replace u_trunk=0 in 46
replace u_weight=0 in 46
*/

tempfile f1
save `f1'
//_2q
qui runmplus u* , cat(all) est(mlr) ///
   model( ///
 carsize by u_hdroom*.7; ///
 carsize by u_trunk*.7 ; ///
 carsize by u_weight*.7 ; ///
 carsize@1;) ///
   idvariable(carid) ///
   output(svalues) ///
   savelogfile(carsize) ///
 saveinputfile(carsize) ///
 saveinputdatafile(carsize) ///
   savedata(save=fscores; file=popfscores.dat;) ///
   log(off)
//_3q
type carsize.inp
//_4q
qui {
cap program drop typeit
program define typeit
   di ""
   local foo = subinstr(stritrim("`1'"),char(0020),"",.)
 local foo = subinstr("`foo'"," ","",.)
 local foo = subinstr("`foo'",";","; ",.)
 local goo = wordcount("`foo'")
 forvalues i=1/`goo' {
     local hoo = subinstr(word("`foo'",`i'),"BY"," BY ",.)
     di _col(5) "`hoo'"
 }
end
}
//_5q
qui runmplus_read_svalues , out(carsize.out)
typeit "`r(svalues)'"
//_6q
qui {
* Three Mplus scoring models: mm1, mm2, mm3. In each
* the factor loadings and thresholds are fixed to their
* population estimates (from above model)
* ──────────────────────────────────────────────────────
* mm1 <- mean(carsize) is fixed, var(carsize) fixed to 1
runmplus_read_svalues , out(carsize.out)
local mm1 "`r(svalues)' [carsize@0];"
local mm1 = subinstr("`mm1'","*","@",.)
* ──────────────────────────────────────────────────────
* mm2 <- mean(carsize) is free, var(carsize) fixed to 1
local mm2 = subinstr("`mm1'","[carsize@0]","[carsize*0]",.)
* ──────────────────────────────────────────────────────
* mm3 <- mean(carsize) and var(carsize) are free
local mm3 = subinstr("`mm2'","size@1","size*1",.)
* ──────────────────────────────────────────────────────
}
//_7q
typeit "`mm1'"
//_8q
typeit "`mm2'"
//_9q
typeit "`mm3'"
//_10
* First I'll see if the jackknife models have been run
* already. If they have, I won't repeat them.
* If I wanted to repeat them, I would have to manually delete
* the `results.dta` file.
cap confirm file results.dta
if _rc~=0 {
  * store results in a matrix
  matrix results=J(219,3,.)
  matrix colnames results = replicate mm carsize
  * call up the example data
  use `f1' , clear
  local Ncars=`c(N)'
  local rowcounter=0
  * loop over the three machines
  forvalues mm=1/3 {
* loop over each of the cars in the data set
forvalues i=1/`Ncars' {
  * except for the Volare
  if `i'~=46 {
    quietly {
      * call up the analysis file
      use `f1' , clear
      * drop the i-th car
      drop if carid==`i'
      * generate factor score estimates
      runmplus u* , idvariable(carid) cat(all) est(mlr) ///
        model(`mm`mm'') ///
        savedata(save=fscores; file=fscores.dat;) ///
        savelogfile(fscores) log(off)
      clear
      * retreive factor score estimates
      runmplus_load_savedata , out(fscores.out)
      * keep the estimate only for the Volare
      keep if carid==46
      keep carsize carid
      * save the results and condition information to matrix
      local carsize_is=carsize
      mat results[`++rowcounter',1]=`i'
      mat results[`rowcounter',2]=`mm'
      mat results[`rowcounter',3]=`carsize_is'
      noisily di "." _c
    }
  }
}
  }
  clear
  matsave results , replace
}
use results.dta
la def mm ///
   1 "1: [f@0]; f@1;" ///
   2 "2: [f*0]; f@1;" ///
   3 "3: [f*0]; f*1;" , modify
label values mm mm
//_11
table mm , c(mean carsize sd carsize n carsize) format(%6.4f)
//_12
clear
runmplus_load_savedata , out(carsize.out)
keep if carid==46
list carsize
//_13
* the data we have for the Volare
list u*
* add a complement row of data
set obs 2
foreach var in u_hdroom u_trunk u_weight {
replace `var'=1-`var'[_n-1] in 2
}
replace carid=999 in 2
* now look at the data
list carid u*
keep u* carid
//_14
quietly runmplus u* , idvariable(carid) cat(all) est(mlr) ///
model(`mm1') ///
savedata(save=fscores; file=fscores.dat;) ///
savelogfile(fscores) log(off)
clear
runmplus_load_savedata , out(fscores.out)
list carid carsize u*
//_15q
qui {
   noisily type Mplus-EAP.do
}
//_^
log close

Appendix 2 - Replication in R, and with MIRT

Doug Tommet prepared this code to replicate the Mplus example using Mplus automation, and also the sample dependency of factor score estimates using R/MIRT

library(tidyverse)
library(MplusAutomation)
# setwd(fs::path_home("Dropbox", "Work", "Syntax"))

# Download the data
url <- "http://www.stata-press.com/data/r13/auto.dta"
auto <- haven::read_dta(url)

# Transform the variables
auto_df <- auto %>%
  mutate(u_hdroom = headroom > quantile(auto$headroom, c(.64)),
         u_trunk = trunk > quantile(auto$trunk, c(.48)),
         u_weight = weight > quantile(auto$weight, c(.65)),
         id = row_number()) %>%
  select(id, starts_with("u"))

# Summary tables
auto_df %>%
  select(starts_with("u")) %>%
  gtsummary::tbl_summary()

auto_df %>%
  select(starts_with("u")) %>%
  psych::tetrachoric()

# Mplus object for the population model
pop_model_mplus <- mplusObject(
  MODEL ="
    carsize by u_hdroom* 0.7;
    carsize by u_trunk* 0.7;
    carsize by u_weight* 0.7;
    [carsize@0];
    carsize @1;
",
  usevariables = colnames(auto_df),
  rdata = auto_df,
  VARIABLE = "idvariable = id; CATEGORICAL = ALL",
  ANALYSIS = "ESTIMATOR = MLR;",
  SAVEDATA = "FILE IS popfscores.dat; SAVE is fscores; FORMAT IS free;",
  OUTPUT = "svalues;"
)
# Run the population model
mplusModeler(pop_model_mplus,
             modelout = "pop_model.inp",
             writeData = "always",
             run = TRUE)

pop_model <- readModels("pop_model.out")

# Create a mplus model statement with the parameters fixed to the population estimates
mod_statement <- pop_model[["parameters"]][["unstandardized"]] %>%
  as_tibble() %>%
  filter(!paramHeader %in% c("Means", "Variances")) %>%
  separate(paramHeader, into = c("p1", "p2"), sep = "\\.") %>%
  mutate(p1 = stringr::str_to_lower(p1),
         param = stringr::str_to_lower(param)) %>%
  mutate(mplus_model = case_when(p2=="BY" ~ glue::glue("{p1} by {param} @ {est};"),
                                 p1=="thresholds" ~ glue::glue("[{param} @ {est}];"))) %>%
  pull(mplus_model)

mod_statement_collapsed <- stringr::str_c(mod_statement, collapse = " \n ")

# Create a function that will
# 1. Fit a mplus model with the parameters fixed, allowing mean and variances to be freely estimated
# 2. Read the factors scores
# 3. Return the factor score for the Volare
mplus_fx <- function(df, mod_extra) {
  # create the mplus object
  fixed_model_mplus <- mplusObject(
    MODEL = stringr::str_c(c(mod_statement_collapsed, mod_extra), collapse = " \n "),
    usevariables = colnames(df),
    rdata = df,
    VARIABLE = "idvariable = id; CATEGORICAL = ALL",
    ANALYSIS = "ESTIMATOR = MLR;",
    SAVEDATA = "FILE IS fixed_fscores.dat; SAVE is fscores; FORMAT IS free;"
  )
  # run mplus
  mplusModeler(fixed_model_mplus,
               modelout = "fixed_model.inp",
               writeData = "always",
               run = TRUE)
  fixed_model <- readModels("fixed_model.out")
  # get the factor scores
  fscores <- fixed_model[["savedata"]] %>%
    as_tibble()
  names(fscores) <- tolower(names(fscores))
  # return the factor score for id=46 (Valare)
  fscores %>%
    filter(id==46) %>%
    pull(carsize)

}

# Create a function to return a data frame with one row excluded
jackknife_fx <- function(i) {
  auto_df %>%
    slice(-i)
}

# Create a data frame containing the 73 jackknifed data frames
# Fit the three mplus models to each of the 73 data frames
jackknife_df <- auto_df %>%
  filter(id != 46) %>%
  mutate(df = purrr::map(id, jackknife_fx)) %>%
  mutate(mm1 = purrr::map_dbl(df, mplus_fx, mod_extra = "[carsize@0]; \n carsize@1;"),
         mm2 = purrr::map_dbl(df, mplus_fx, mod_extra = "[carsize*]; \n carsize@1;"),
         mm3 = purrr::map_dbl(df, mplus_fx, mod_extra = "[carsize*]; \n carsize*;"))

# Summarize the variability in the estimate for the Volare
jackknife_df %>%
  select(starts_with("mm")) %>%
  gtsummary::tbl_summary(type = c(mm1, mm2, mm3) ~ "continuous",
                         statistic = gtsummary::all_continuous() ~ "{mean} ({sd})")

# Fit the mplus model to a single row of data
volare_df <- auto_df %>%
  filter(id==46)
# Can't actually fit to a single row of data because Mplus expects categorical variables to have at least two levels,
# so, getting all the levels of the variables and then appending the single data row to it
volare_df_expanded <- auto_df %>%
  group_by(u_hdroom, u_trunk, u_weight) %>%
  count() %>%
  select(-n) %>%
  bind_rows(volare_df)

mplus_fx(volare_df_expanded, "[carsize@0]; \n carsize@1;")
mplus_fx(auto_df, "[carsize@0]; \n carsize@1;")

# Trying to replicate the analysis in MIRT
library(mirt)
# Prepare the data for mirt
auto_mirt <- auto_df %>%
  select(starts_with("u")) %>%
  mutate(u_hdroom = as.numeric(u_hdroom),
         u_trunk = as.numeric(u_trunk),
         u_weight = as.numeric(u_weight))
# Fit the population model
pop_mirt_model <- mirt.model('carsize = 1-3')
pop_mirt <- mirt(auto_mirt, pop_mirt_model)
coef(pop_mirt, simplify=TRUE)


mirt_model_1 <- mirt.model('carsize = 1-3
                            START = (1, a1, 2.824)
                            START = (2, a1, 5.018)
                            START = (3, a1, 1.430)
                            START = (1, d, -2.763)
                            START = (2, d, -0.156)
                            START = (3, d, -0.923)
                            FIXED = (1, 2, 3, a1)
                            FIXED = (1, 2, 3, d)')
mirt_model_2 <- mirt.model('carsize = 1-3
                            START = (1, a1, 2.824)
                            START = (2, a1, 5.018)
                            START = (3, a1, 1.430)
                            START = (1, d, -2.763)
                            START = (2, d, -0.156)
                            START = (3, d, -0.923)
                            FIXED = (1, 2, 3, a1)
                            FIXED = (1, 2, 3, d)
                            MEAN = carsize')
mirt_model_3 <- mirt.model('carsize = 1-3
                            START = (1, a1, 2.824)
                            START = (2, a1, 5.018)
                            START = (3, a1, 1.430)
                            START = (1, d, -2.763)
                            START = (2, d, -0.156)
                            START = (3, d, -0.923)
                            FIXED = (1, 2, 3, a1)
                            FIXED = (1, 2, 3, d)
                            MEAN = carsize
                            COV = carsize*carsize')

# Create a function that fits a mirt model and extracts the factor score for the Volare
mirt_fx <- function(df, mod) {
  df_mirt <- df %>%
    select(starts_with("u")) %>%
    mutate(u_hdroom = as.numeric(u_hdroom),
           u_trunk = as.numeric(u_trunk),
           u_weight = as.numeric(u_weight))
  x <- mirt(df_mirt, mod)
  f <- fscores(x) %>%
    as_tibble() %>%
    bind_cols(df)
  f %>%
    filter(id==46) %>%
    pull(carsize)
}
# Fit the three mirt models to the jackknife data
jackknife_df <- jackknife_df %>%
  mutate(mirt_m1 = purrr::map_dbl(df, mirt_fx, mod = mirt_model_1),
         mirt_m2 = purrr::map_dbl(df, mirt_fx, mod = mirt_model_2),
         mirt_m3 = purrr::map_dbl(df, mirt_fx, mod = mirt_model_3))
# Summarize the variability in factor scores for the Volare
jackknife_df %>%
  select(starts_with("m")) %>%
  gtsummary::tbl_summary(type = c(mm1, mm2, mm3, mirt_m1, mirt_m2, mirt_m3) ~ "continuous",
                         statistic = gtsummary::all_continuous() ~ "{mean} ({sd})")
Results

fin