Rich Jones
2025-07-12
Overview
This note walks through the definition of and calculation of McDonald’s Omega ($\omega$) and factor score determinacy ($\rho$), and illustrates a special case where $\omega = \rho^2$. The special case is for a single factor model, all items are continuous, all factor loadings are equal (so-called tau equivalence or parallel tests).
Omega ($\omega$) reflects the proportion of variance in a unit-weighted observed composite score that is attributable to the common factor (McDonald, 1999, page 89).
Factor determinacy ($\rho$) reflects the correlation between a regression-based factor score estimate and the underlying common factor (Muthén, Mplus Technical Appendices, page 47). Therefore, $\rho^2$ would reflect the proportion of variance in a regression-based factor score estimate that is attributable to the underlying common factor.
Some authors/programs will refer to determinacy as $\rho$, others $\rho^2$: so read carefully.
Latent variables, often referred to as factors, represent theoretical constructs that cannot be directly measured but are inferred from a set of observed indicators. In some SEM applications, researchers aim to obtain individual scores on these latent variables, known as factor scores. These scores are essentially estimates of an individual's standing on the unobserved construct. However, estimated factor scores do not inherently possess the exact properties of the true, underlying latent factors. Rather, they serve as approximations, and the accuracy of these approximations directly influences the validity and reliability of any subsequent analyses in which these scores are utilized, whether as predictors, dependent variables, or for classification purposes.
Methodologists have developed various ways to express the quality of these factor score estimates, and this note concerns two of these: factor reliability via coefficient omega, and factor determinacy.
"Omega is the ratio of the true-score variance of $Y$ to the total variance of $Y$. Here the true-score variance is interpreted as the variance due to the (common) attribute" (McDonald 1999, page 89), and $Y$ is the unit-weighted total score (i.e., the sum of item scores), The “common attribute” is the latent factor in a common factor model. McDonald’s omega quantifies:
$$ \omega = \frac{\operatorname{Var}(\text{true score of } Y)}{\operatorname{Var}(Y)} $$And in the context of a single-factor model, the true-score variance of $Y$ is the variance due to the common factor, not just any shared variance among items.
Given:
The numerator is the squared sum of loadings: $(\lambda_1 + \lambda_2 + \cdots + \lambda_p)^2$. The denominator is that same squared sum plus the sum of residual variances (i.e., $1 - \lambda_i^2$ for each item).
from sympy import symbols, Matrix, sqrt, eye, simplify
# Step 1: Define parameters
p = 5 # number of indicators
lambda_val = 1 / sqrt(2) # standardized loading (0.707...)
theta_val = 1 - lambda_val**2 # residual variance for each indicator
# Step 2: Create the lambda vector
lambda_vec = Matrix([lambda_val] * p)
# Step 3: Construct the observed covariance matrix Σ = λλ' + Θ
lambda_outer = lambda_vec * lambda_vec.T # λλ'
theta_matrix = eye(p) * theta_val # Diagonal matrix with residual variances
Sigma = lambda_outer + theta_matrix # Total observed covariance matrix
#Sigma
lambda_vec
# Factor loadings in this example are .707....
foo = (sqrt(2)/2)
foo.evalf()
lambda_outer
theta_matrix
Sigma
# Step 4: Compute omega
sum_lambda = sum(lambda_vec)
numerator_omega = sum_lambda**2
denominator_omega = numerator_omega + p * theta_val
omega = simplify(numerator_omega / denominator_omega)
# omega
sum_lambda
numerator_omega
denominator_omega
omega
In a single-factor model, where:
Then omega can be written as:
$$ \omega = \frac{\mathbf{1}^\top \boldsymbol{\Lambda} \boldsymbol{\Lambda}^\top \mathbf{1}}{\mathbf{1}^\top \boldsymbol{\Sigma} \mathbf{1}} $$where
For unit-weighted total scores $\mathbf{Y} = \mathbf{1}^\top \mathbf{y}$, Omega can be expressed as:
$$ \omega = \frac{\mathbf{1}^\top \Lambda \Lambda^\top \mathbf{1}}{\mathbf{1}^\top \Sigma \mathbf{1}} $$If we are working in the single factor case, then
$\Sigma = \Lambda \Lambda^\top + \Theta$, where:
So, the simplified omega becomes:
$$ \omega = \frac{(\mathbf{1}^\top \lambda)^2}{\mathbf{1}^\top (\lambda \lambda^\top + \Theta) \mathbf{1}} $$The numerator can be further simplified:
$$ \mathbf{1}^\top \lambda = \sum_{i=1}^p \lambda_i \quad \Rightarrow \quad (\mathbf{1}^\top \lambda)^2 = \left( \sum_{i=1}^p \lambda_i \right)^2 $$As can the denominator:
$$ \mathbf{1}^\top (\lambda \lambda^\top + \Theta) \mathbf{1} = \mathbf{1}^\top \lambda \lambda^\top \mathbf{1} + \mathbf{1}^\top \Theta \mathbf{1} = (\sum \lambda_i)^2 + \sum \theta_i $$where $\theta_i = 1 - \lambda_i^2$ if variables are standardized. This leaves:
$$ \omega = \frac{\left( \sum_{i=1}^p \lambda_i \right)^2}{\left( \sum_{i=1}^p \lambda_i \right)^2 + \sum_{i=1}^p (1 - \lambda_i^2)} = \frac{(\sum \lambda_i)^2}{(\sum \lambda_i)^2 + \sum (1 - \lambda_i^2)} $$# Computation using matrix form
# Step 2: Construct Lambda * Lambda'
# lambda_outer = lambda_vec * lambda_vec.T <- already created
# Step 3: Residual variance matrix
# theta_matrix = eye(p) * theta_val <- already created
# Step 4: Sigma = Lambda*Lambda' + Theta
Sigma = lambda_outer + theta_matrix
lambda_outer + theta_matrix
# Step 5: 1' * Sigma * 1 and 1' * Lambda * Lambda' * 1
ones = Matrix([1] * p)
ones
numerator = (ones.T * Lambda_outer * ones)[0]
numerator
denominator = (ones.T * Sigma * ones)[0]
denominator
# Step 6: Omega
omega = simplify(numerator / denominator)
omega
omega.evalf()
Factor determinacy ($\rho$) reflects the correlation between a regression-based factor score estimate and the underlying common factor (Muthén, Mplus Technical Appendices, page 47). Beauducel & Hilger (2017) provide (rewritten to use Mplus notation):
$$ \rho^2 = \operatorname{diag} \left( \boldsymbol{\Psi} \boldsymbol{\Lambda}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\Lambda} \boldsymbol{\Psi} \right) $$where:
Meaning | Mplus Notation |
---|---|
Factor loading matrix | $\boldsymbol{\Lambda}$ |
Factor covariance matrix | $\boldsymbol{\Psi}$ |
Observed variable covariance matrix | $\boldsymbol{\Sigma}$ |
Factor determinacy coefficient squared | $\rho^2$ |
This gives the squared correlation between the latent variable $\eta$ and its regression-based factor score estimate $\hat{\eta}$. Note that Mplus output provides $\rho$ as the factor determinacy estimate.
In the case of a single factor model, and unit variance assumed for the common factor, we have:
$$ \rho^2 = \operatorname{diag} \left( \boldsymbol{\Lambda}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\Lambda} \right) = \lambda^\top \Sigma^{-1} \lambda $$where $\lambda^\top \Sigma^{-1} \lambda$ is a scalar quantity that can be interpreted as a generalized squared length of the factor loading vector $\lambda$ in the space defined by the inverse of the observed variables' covariance matrix $\Sigma^{-1}$. It is scalar because
So:
Therefore, $\lambda^\top \Sigma^{-1} \lambda$ is a single number (a scalar) that tells you how well the factor can be recovered from the observed indicators. It equals the squared correlation between the latent factor and its optimal (regression-based) score estimate.
# Step 5: Compute factor determinacy: ρ² = λ' Σ⁻¹ λ
Sigma_inv = simplify(Sigma.inv())
rho_squared = simplify((lambda_vec.T * Sigma_inv * lambda_vec)[0])
rho_squared
rho_squared.evalf()
In the Mplus Discussion Board, Bengt Muthén states:
“If you have categorical outcomes you don’t get factor determinacy because that is a concept valid only for continuous outcomes. With categorical outcomes you would instead consider ‘item information’ which Mplus provides.” (https://www.statmodel.com/discussion/messages/9/533.html?1576705397)
Factor determinacy -- the correlation between true latent factor scores and estimated regression-based factor scores -- only applies when items are continuous, as the regression-based factor scores depend on linear relationships between indicators and factor. When indicators are binary or ordinal, Mplus does not provide this coefficient. Instead, Muthén suggests looking at item and test information functions, closely aligned with IRT (Item Response Theory) concepts, which reflect how well each item informs the underlying latent trait.
See Appendix 2 for discussion of an idea for using Mplus Bayesian plausible values for obtaining a factor determinacy-like statistic, which would be applicable to situations in which the indicators included categorical indicators.
Coefficient alpha (Cronbach's alpha) is a conceptually similar quantity to coefficient omega. However, alpha assumes that all loadings are equal. In our example, we have set all loadings to be equal, but this is not usually the case with real data. Omega can be calculated with varying factor laodings, alpha does not even consider the loadings, and is instead defined based on the variances and covariances among the items (c.f., Mplus Discussion via Statmodel.com).
where
# Cronbach's alpha
p = Sigma.shape[0] # find the dimensions of sigma
ones = Matrix([1] * p) # define "I"
# Cronbach's alpha
numerator = trace(Sigma)
denominator = (ones.T * Sigma * ones)[0]
alpha = simplify(p / (p - 1) * (1 - numerator / denominator))
alpha.evalf()
Reliability Measure | Meaning |
---|---|
alpha | Common variance of sum score and a latent trait, if the latent trait were tau-equivalent |
omega | Common variance of a sum score and modeled latent trait |
determinacy ($\rho^2$) | Common variance of a regression-based factor score estimate and a modeled latent trait |
See Appendix 1 for definitions of congeneric, tau-equivalent, and parallel model types.
Beauducel, A., & Hilger, N. (2017). On the bias of factor score determinacy coefficients based on different estimation methods of the exploratory factor model. Communications in Statistics-Simulation and Computation, 46(8), 6144-6154.
Lord, F. M., & Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Reading, MA: Addison-Wesley.
McDonald, R. P. (1999). Test Theory: A Unified Treatment. Mahwah, NJ: Lawrence Erlbaum.
Muthén, B. (1998-2004). Mplus Technical Appendices (Version 3, March 2004 ed.). Muthén & Muthén. https://statmodel.com/download/techappen.pdf
Grice, J. W. (2001). Computing and evaluating factor scores. Psychological methods, 6(4), 430-450.
Raykov, T., & Marcoulides, G. A. (2011). Introduction to Psychometric Theory. New York: Routledge.
Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s α, Revelle’s β, and McDonald’s ωH: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70(1), 123-133.
In measurement theory there is a hierarchy of models in terms of the restrictions they impose on the items:
Congeneric model: Each indicator is allowed its own factor loading and unique error variance. In other words, the items may differ in how strongly they relate to the latent construct and in their measurement error.
Tau-equivalent model: Here the indicators are assumed to have equal factor loadings, meaning they all scale the latent variable in the same way. However, the error variances (unique variances) can differ across items.
When your factor analysis model specifies that both the factor loadings and the unique variances are equal for all indicators, the items are considered parallel. This is the most restrictive form, implying that each item is essentially interchangeable, differing only by random error that is identical in variance across items.
The term "tau-equivalent" comes from classical test theory (CTT), particularly from the foundational work by Lord and Novick (1968) in Statistical Theories of Mental Test Scores. The Greek letter τ (tau) was used to denote the true score of an individual on a psychological or educational test.
In CTT, the observed score $X_i$ on item $i$ is modeled as:
$$ Y_i = \tau_i + \varepsilon_i $$Where:
The tau-equivalent model assumes:
In other words, the items are equally sensitive to the latent trait (same slope or loading) but may differ in measurement error. That’s what distinguishes tau-equivalence from other models in the hierarchy:
Model Type | Factor Loadings | Error Variances |
---|---|---|
Congeneric | Free | Free |
Tau-equivalent | Equal | Free |
Parallel | Equal | Equal |
The use of τ (tau) was conventional in earlier psychometric theory to represent true scores, much like we use $\theta$ for latent traits in modern IRT or factor models. So "tau-equivalent" refers to items that are equally related to the same true score (or latent factor), hence having equal factor loadings.
This appendix discusses using Bayesian plausible values (PVs) from Mplus to approximate a measure similar to factor score determinacy $\rho^2$. Factor determinacy $\rho$ is:
$$ \rho = \text{corr}(\hat{\eta}, \eta) $$and $\rho^2$ is:
$$ \rho^2 = \text{Var}(\mathbb{E}[\eta \mid \mathbf{x}]) / \text{Var}(\eta) $$This quantifies how much information the observed indicators provide about the latent variable — it's the R² for predicting the factor from the indicators.
In Mplus's Bayesian framework, plausible values (PVs) are posterior draws from:
$$ p(\eta_j \mid \mathbf{x}_j) $$So for person $j$, Mplus gives you multiple samples $\eta_j^{(1)}, \eta_j^{(2)}, \dots, \eta_j^{(M)}$, representing uncertainty about their factor given their observed data.
We can estimate:
$$ \rho^2 = \frac{\text{Var}(\mathbb{E}[\eta_j \mid \mathbf{x}_j])}{\text{Var}(\eta_j)} $$via these steps:
Let:
Obtain an $N \times M$ matrix of plausible values: each row is a person, each column is one PV draw.
For each person $j$, compute:
$$ \bar{\eta}_j = \frac{1}{M} \sum_{m=1}^{M} \eta_j^{(m)} $$Then compute:
$$ \text{Var}(\bar{\eta}_j) \quad \text{(across persons)} $$This is the numerator: variance of posterior means.
The total variance of $\eta$ is approximated by:
$$ \text{Var}(\eta_j^{(m)}) = \text{Var}(\bar{\eta}_j) + \mathbb{E}_j[\text{Var}(\eta_j^{(m)} \mid \mathbf{x}_j)] $$Compute this as:
total_var = np.var(pvs, ddof=1) # flattened matrix
Or manually as:
$$ \text{Total variance} = \text{Between-person variance} + \text{Average within-person variance} $$This aligns with the interpretation of $R^2$ — fraction of variance in the latent factor "explained" by observed indicators.
Example Code in Python (if PVs are in a matrix pvs
):
import numpy as np
# pvs: N x M matrix of plausible values
posterior_means = np.mean(pvs, axis=1)
between_var = np.var(posterior_means, ddof=1)
# within-person variance
within_vars = np.var(pvs, axis=1, ddof=1)
avg_within_var = np.mean(within_vars)
# total variance of factor
total_var = between_var + avg_within_var
# approximate factor determinacy squared
rho2 = between_var / total_var
print(f"Estimated rho^2 ≈ {rho2:.3f}")