Reliability and factor models: Omega and Factor Determinacy (and alpha)

Rich Jones
2025-07-12

Overview

This note walks through the definition of and calculation of McDonald’s Omega ($\omega$) and factor score determinacy ($\rho$), and illustrates a special case where $\omega = \rho^2$. The special case is for a single factor model, all items are continuous, all factor loadings are equal (so-called tau equivalence or parallel tests).

Omega ($\omega$) reflects the proportion of variance in a unit-weighted observed composite score that is attributable to the common factor (McDonald, 1999, page 89).

Factor determinacy ($\rho$) reflects the correlation between a regression-based factor score estimate and the underlying common factor (Muthén, Mplus Technical Appendices, page 47). Therefore, $\rho^2$ would reflect the proportion of variance in a regression-based factor score estimate that is attributable to the underlying common factor.

Some authors/programs will refer to determinacy as $\rho$, others $\rho^2$: so read carefully.

Latent variables and their quality

Latent variables, often referred to as factors, represent theoretical constructs that cannot be directly measured but are inferred from a set of observed indicators. In some SEM applications, researchers aim to obtain individual scores on these latent variables, known as factor scores. These scores are essentially estimates of an individual's standing on the unobserved construct. However, estimated factor scores do not inherently possess the exact properties of the true, underlying latent factors. Rather, they serve as approximations, and the accuracy of these approximations directly influences the validity and reliability of any subsequent analyses in which these scores are utilized, whether as predictors, dependent variables, or for classification purposes.

Methodologists have developed various ways to express the quality of these factor score estimates, and this note concerns two of these: factor reliability via coefficient omega, and factor determinacy.

Omega

"Omega is the ratio of the true-score variance of $Y$ to the total variance of $Y$. Here the true-score variance is interpreted as the variance due to the (common) attribute" (McDonald 1999, page 89), and $Y$ is the unit-weighted total score (i.e., the sum of item scores), The “common attribute” is the latent factor in a common factor model. McDonald’s omega quantifies:

$$ \omega = \frac{\operatorname{Var}(\text{true score of } Y)}{\operatorname{Var}(Y)} $$

And in the context of a single-factor model, the true-score variance of $Y$ is the variance due to the common factor, not just any shared variance among items.

Given:

$$ \omega = \frac{\left( \sum_{i=1}^p \lambda_i \right)^2}{\left( \sum_{i=1}^p \lambda_i \right)^2 + \sum_{i=1}^p (1 - \lambda_i^2)} $$

The numerator is the squared sum of loadings: $(\lambda_1 + \lambda_2 + \cdots + \lambda_p)^2$. The denominator is that same squared sum plus the sum of residual variances (i.e., $1 - \lambda_i^2$ for each item).

Matrix Expression for Omega

In a single-factor model, where:

Then omega can be written as:

$$ \omega = \frac{\mathbf{1}^\top \boldsymbol{\Lambda} \boldsymbol{\Lambda}^\top \mathbf{1}}{\mathbf{1}^\top \boldsymbol{\Sigma} \mathbf{1}} $$

where

For unit-weighted total scores $\mathbf{Y} = \mathbf{1}^\top \mathbf{y}$, Omega can be expressed as:

$$ \omega = \frac{\mathbf{1}^\top \Lambda \Lambda^\top \mathbf{1}}{\mathbf{1}^\top \Sigma \mathbf{1}} $$

If we are working in the single factor case, then

So, the simplified omega becomes:

$$ \omega = \frac{(\mathbf{1}^\top \lambda)^2}{\mathbf{1}^\top (\lambda \lambda^\top + \Theta) \mathbf{1}} $$

The numerator can be further simplified:

$$ \mathbf{1}^\top \lambda = \sum_{i=1}^p \lambda_i \quad \Rightarrow \quad (\mathbf{1}^\top \lambda)^2 = \left( \sum_{i=1}^p \lambda_i \right)^2 $$

As can the denominator:

$$ \mathbf{1}^\top (\lambda \lambda^\top + \Theta) \mathbf{1} = \mathbf{1}^\top \lambda \lambda^\top \mathbf{1} + \mathbf{1}^\top \Theta \mathbf{1} = (\sum \lambda_i)^2 + \sum \theta_i $$

where $\theta_i = 1 - \lambda_i^2$ if variables are standardized. This leaves:

$$ \omega = \frac{\left( \sum_{i=1}^p \lambda_i \right)^2}{\left( \sum_{i=1}^p \lambda_i \right)^2 + \sum_{i=1}^p (1 - \lambda_i^2)} = \frac{(\sum \lambda_i)^2}{(\sum \lambda_i)^2 + \sum (1 - \lambda_i^2)} $$

Factor determinacy

Factor determinacy ($\rho$) reflects the correlation between a regression-based factor score estimate and the underlying common factor (Muthén, Mplus Technical Appendices, page 47). Beauducel & Hilger (2017) provide (rewritten to use Mplus notation):

$$ \rho^2 = \operatorname{diag} \left( \boldsymbol{\Psi} \boldsymbol{\Lambda}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\Lambda} \boldsymbol{\Psi} \right) $$

where:

Meaning Mplus Notation
Factor loading matrix $\boldsymbol{\Lambda}$
Factor covariance matrix $\boldsymbol{\Psi}$
Observed variable covariance matrix $\boldsymbol{\Sigma}$
Factor determinacy coefficient squared $\rho^2$

This gives the squared correlation between the latent variable $\eta$ and its regression-based factor score estimate $\hat{\eta}$. Note that Mplus output provides $\rho$ as the factor determinacy estimate.

In the case of a single factor model, and unit variance assumed for the common factor, we have:

$$ \rho^2 = \operatorname{diag} \left( \boldsymbol{\Lambda}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\Lambda} \right) = \lambda^\top \Sigma^{-1} \lambda $$

where $\lambda^\top \Sigma^{-1} \lambda$ is a scalar quantity that can be interpreted as a generalized squared length of the factor loading vector $\lambda$ in the space defined by the inverse of the observed variables' covariance matrix $\Sigma^{-1}$. It is scalar because

Therefore, $\lambda^\top \Sigma^{-1} \lambda$ is a single number (a scalar) that tells you how well the factor can be recovered from the observed indicators. It equals the squared correlation between the latent factor and its optimal (regression-based) score estimate.

Closing comments

Mplus does not give factor determinacy when there are categorical indicators

In the Mplus Discussion Board, Bengt Muthén states:

“If you have categorical outcomes you don’t get factor determinacy because that is a concept valid only for continuous outcomes. With categorical outcomes you would instead consider ‘item information’ which Mplus provides.” (https://www.statmodel.com/discussion/messages/9/533.html?1576705397)

Factor determinacy -- the correlation between true latent factor scores and estimated regression-based factor scores -- only applies when items are continuous, as the regression-based factor scores depend on linear relationships between indicators and factor. When indicators are binary or ordinal, Mplus does not provide this coefficient. Instead, Muthén suggests looking at item and test information functions, closely aligned with IRT (Item Response Theory) concepts, which reflect how well each item informs the underlying latent trait.

See Appendix 2 for discussion of an idea for using Mplus Bayesian plausible values for obtaining a factor determinacy-like statistic, which would be applicable to situations in which the indicators included categorical indicators.

Cronbach's alpha

Coefficient alpha (Cronbach's alpha) is a conceptually similar quantity to coefficient omega. However, alpha assumes that all loadings are equal. In our example, we have set all loadings to be equal, but this is not usually the case with real data. Omega can be calculated with varying factor laodings, alpha does not even consider the loadings, and is instead defined based on the variances and covariances among the items (c.f., Mplus Discussion via Statmodel.com).

$$ \alpha = \frac{p}{p - 1} \left(1 - \frac{\operatorname{tr}(\Sigma)}{1^\top \Sigma 1} \right) $$

where

Summary

Reliability Measure Meaning
alpha Common variance of sum score and a latent trait, if the latent trait were tau-equivalent
omega Common variance of a sum score and modeled latent trait
determinacy ($\rho^2$) Common variance of a regression-based factor score estimate and a modeled latent trait

See Appendix 1 for definitions of congeneric, tau-equivalent, and parallel model types.

References

Beauducel, A., & Hilger, N. (2017). On the bias of factor score determinacy coefficients based on different estimation methods of the exploratory factor model. Communications in Statistics-Simulation and Computation, 46(8), 6144-6154.

Lord, F. M., & Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Reading, MA: Addison-Wesley.

McDonald, R. P. (1999). Test Theory: A Unified Treatment. Mahwah, NJ: Lawrence Erlbaum.

Muthén, B. (1998-2004). Mplus Technical Appendices (Version 3, March 2004 ed.). Muthén & Muthén. https://statmodel.com/download/techappen.pdf

Additional Reading

Grice, J. W. (2001). Computing and evaluating factor scores. Psychological methods, 6(4), 430-450.

Raykov, T., & Marcoulides, G. A. (2011). Introduction to Psychometric Theory. New York: Routledge.

Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s α, Revelle’s β, and McDonald’s ωH: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70(1), 123-133.

Appendix 1. Some terminology

In measurement theory there is a hierarchy of models in terms of the restrictions they impose on the items:

When your factor analysis model specifies that both the factor loadings and the unique variances are equal for all indicators, the items are considered parallel. This is the most restrictive form, implying that each item is essentially interchangeable, differing only by random error that is identical in variance across items.

The term "tau-equivalent" comes from classical test theory (CTT), particularly from the foundational work by Lord and Novick (1968) in Statistical Theories of Mental Test Scores. The Greek letter τ (tau) was used to denote the true score of an individual on a psychological or educational test.

Origin and Meaning of Tau-Equivalent

In CTT, the observed score $X_i$ on item $i$ is modeled as:

$$ Y_i = \tau_i + \varepsilon_i $$

Where:

The tau-equivalent model assumes:

In other words, the items are equally sensitive to the latent trait (same slope or loading) but may differ in measurement error. That’s what distinguishes tau-equivalence from other models in the hierarchy:

Model Type Factor Loadings Error Variances
Congeneric Free Free
Tau-equivalent Equal Free
Parallel Equal Equal

Why "tau"?

The use of τ (tau) was conventional in earlier psychometric theory to represent true scores, much like we use $\theta$ for latent traits in modern IRT or factor models. So "tau-equivalent" refers to items that are equally related to the same true score (or latent factor), hence having equal factor loadings.

Appendix 2. Determinacy using Bayesian Plausible Values

This appendix discusses using Bayesian plausible values (PVs) from Mplus to approximate a measure similar to factor score determinacy $\rho^2$. Factor determinacy $\rho$ is:

$$ \rho = \text{corr}(\hat{\eta}, \eta) $$

and $\rho^2$ is:

$$ \rho^2 = \text{Var}(\mathbb{E}[\eta \mid \mathbf{x}]) / \text{Var}(\eta) $$

This quantifies how much information the observed indicators provide about the latent variable — it's the for predicting the factor from the indicators.

In Mplus's Bayesian framework, plausible values (PVs) are posterior draws from:

$$ p(\eta_j \mid \mathbf{x}_j) $$

So for person $j$, Mplus gives you multiple samples $\eta_j^{(1)}, \eta_j^{(2)}, \dots, \eta_j^{(M)}$, representing uncertainty about their factor given their observed data.

We can estimate:

$$ \rho^2 = \frac{\text{Var}(\mathbb{E}[\eta_j \mid \mathbf{x}_j])}{\text{Var}(\eta_j)} $$

via these steps:

1. Get the plausible values matrix

Let:

Obtain an $N \times M$ matrix of plausible values: each row is a person, each column is one PV draw.

2. Compute variance of person-level posterior means

For each person $j$, compute:

$$ \bar{\eta}_j = \frac{1}{M} \sum_{m=1}^{M} \eta_j^{(m)} $$

Then compute:

$$ \text{Var}(\bar{\eta}_j) \quad \text{(across persons)} $$

This is the numerator: variance of posterior means.

3. Compute total variance (within + between)

The total variance of $\eta$ is approximated by:

$$ \text{Var}(\eta_j^{(m)}) = \text{Var}(\bar{\eta}_j) + \mathbb{E}_j[\text{Var}(\eta_j^{(m)} \mid \mathbf{x}_j)] $$

Compute this as:

total_var = np.var(pvs, ddof=1)  # flattened matrix

Or manually as:

$$ \text{Total variance} = \text{Between-person variance} + \text{Average within-person variance} $$

4. Compute $\rho^2$

$$ \rho^2 \approx \frac{\text{Between-person variance}}{\text{Total variance}} $$

This aligns with the interpretation of $R^2$ — fraction of variance in the latent factor "explained" by observed indicators.

Example Code in Python (if PVs are in a matrix pvs):

import numpy as np

# pvs: N x M matrix of plausible values
posterior_means = np.mean(pvs, axis=1)
between_var = np.var(posterior_means, ddof=1)

# within-person variance
within_vars = np.var(pvs, axis=1, ddof=1)
avg_within_var = np.mean(within_vars)

# total variance of factor
total_var = between_var + avg_within_var

# approximate factor determinacy squared
rho2 = between_var / total_var
print(f"Estimated rho^2 ≈ {rho2:.3f}")