last update: October 31 2023
THIS IS A WORK IN PROGRESS
This essay is about effect size measures and how to obtain and report
them using Mplus software. Mplus fits a variety of models, here I will
be discussing single group models with categorical or continuous
dependent (including latent) and independent variables. I will discuss
Mplus output, including unstandardized and standardized
(STDYX
, STDY
, STD
) parameter
estimates, and their relationships to commonly reported effect sizes in
the behavioral sciences, especially as popularized by Jacob Cohen (1969,
1988).
Here is the tl;dr:
Following is a table showing a variety of effect sizes one might choose for different kinds of data situations, and hints as to how to find and report effect sizes from a Mplus output. Mplus produces unstandardized estimates and three flavors of standardized estimates (see Appendix A) and some of these map on to commonly used effect size statistics. Note well: Mplus does not care about the scale of the explanatory variables. Explanatory variables are handled the same way regardless of their scale, and this is as continuous variables. The Mplus framework is regression based, where dependent variables are regressed on independent variables, and unstandardized parameter estimates always indicate the difference in the dependent variable for a 1 unit difference in the explanatory variable. It is up to the user to correctly identify and interpret the meaningful parts of the output.
Explanatory | Dependent | Variable | |
---|---|---|---|
Variable | 2 Categories | 3+ Categories | Continuous |
2 Categories | Cohen’s h, Cohen’s w (\(\phi\) coefficient, STDYX ),
Odds Ratio, Risk Ratio, unstandardized regression parameter (probit)
(note 11) |
Cohen’s w, odds ratio | Cohen’s d, total sample standardized mean
difference (STDY ), Point-biserial correlation (\(r_{pb}\), STDYX ) (note
13) |
3+ Categories | Cohen’s w, odds ratio | Cohen’s w, odds ratio | Pearson’s r (STDYX ), or
standardized mean difference per unit difference in \(x\) (STDY ) (note 23) |
Continuous | Odds ratio, consider standardizing \(x\) to 2 standard deviation units (note 31) | Odds ratio, Pearson’s r
(STDYX ) |
Pearson’s r (STDYX ) |
Notes:
note 11. With a categorical dependent variable, in
Mplus you are doing (ordered) logistic (and using ML) or (ordered)
probit† regression (and using weighted least squares or Bayes). Natural
effect size estimates are odds ratios if logistic regression is used
(obtained by taking the exponent of the unstandardized regression
parameter). A special case when the explanatory variable is binary
(\(x \in{0,1}\)) and the dependent
variable is binary (\(y \in{0,1}\)):
the STDYX
estimate of the regression of y ON x
is on the scale of a Pearson correlation coefficient, which is the same
as a \(\phi\) coefficient. This is one
of Cohen’s effect size statistics (w) and interpreted on the
r scale (.1/.3/.5 S/M/L).
note 13. Although the Mplus Users
Guide indicates the the STDYX
-scaled effect should not
be used when the explanatory variable is categorical (See Appendix A,
below), when the explanatory variable is binary the STDYX
effect is a point-biserial correlation coefficient (\(r_{pb}\)) and is an effect size statistic
in the r family. Perhaps more usefully, the \(r_{pb}\) and be converted to a proper
Cohen’s d, also explained below.
If the explanatory variable is binary (\(x
\in{0,1}\)) then either or both of the effects under the
STD
or STDY
standardization provide effect
size estimates that are in the d family. Use STD
or STDY
if the dependent variable is a latent variable, use
STDY
if the dependent variable is a manifest (i.e.,
observed) variable. When the explanatory variable is binary the
STD
and STDY
scaled effects are mean
differences standardized to the total sample standard
deviation. Note that Cohen’s d is the mean difference
standardized to the within-group or pooled standard deviation,
which is smaller than the total sample standard deviation. The
STD
and STDY
effects can be reported as
“standardized mean differences” and in the methods specify “standardized
with respect to the total sample standard deviation”. Do not describe as
Cohen’s d, but feel free to use the usual Cohen’s d
thresholds of .2, .5, and .8 as defining small, medium, and large
effects.
note 23. When the explanatory variable has many
categories (i.e., more than 2) I would consider reporting either the
STDY
effect or the STDYX
effect. The
STDY
effect provides the total sample standard deviation
difference per 1-unit difference in the explanatory variable. There is
no comparable effect size statistic in Cohen’s taxonomy. The
STDYX
scaled effect is a Pearson-scaled correlation between
the latent response variable for the dependent variable and the manifest
variable, treating the manifest variable as a continuous variable. This
I would interpret as a correlation coefficient (which is in Cohen’s
taxonomy with S/M/L thresholds of .1/.3/.5). Here is how you decide
between STDY
and STDYX
: if you would be
comfortable interpreting a Pearson correlation between the
many-category-explanatory variable and the continuous dependent
variable, use STDYX
. Otherwise, use something else, such as
STDY
, or break the many-category explanatory variable into
a set of dummies and use Cohen’s d relative to a reference
level of the categorical variable (or STDY
as a Cohen’s
d family [read on] effect size statistic). Still another option
would be to “bring the explanatory variable into the model” as a
categorical variable and estimate a polyserial correlation with
STDYX
(see Appendix E).
note 31, 32. With a categorical dependent variable, in Mplus you are doing (ordered) logistic (and using ML) or (ordered) probit regression (and using weighted least squares or Bayes). Natural effect size estimates are odds ratios if logistic regression is used (obtained by taking the exponent of the unstandardized regression parameter). Consider standardizing your explanatory variable to some meaningful units (e.g., age per 10 years) or with one or 2 (following Gelman 2008) standard deviation units. If doing probit regression, the regression parameter estimates describe the difference in normal probability (Z-score) units per 1-unit difference in the explanatory variable.
† I discuss an effect size interpretation of probit regression coefficients using the Educational Testing Services A/B/C differential item functioning categories in this blog post. To save you a click, in that post I argue that when the Mplus estimator is MLR/probit, WLSMV/theta, or Bayes, negligible effects are less than .1 (A DIF), and large effects are greater than .375 (C DIF), and the remainder are slight to moderate effects (B DIF). This interpretation is worked out for binary dependent variables and binary independent variables.
A standard set of effect size measures in psychology and other applied fields are those described by Jacob Cohen in his Statistical power analysis for the behavioral sciences (1988, Second edition, LEA, Hillsdale NJ). I will presume the reader has some familiarity with Cohen’s effect size framework, and an absolute bare minimum is familiarity with Cohen’s d effect size. His book can be found online in pdf form, and there have been countless review manuscripts and are innumerable websites summarizing this material. Cohen’s book is a classic and worth obtaining and at least thumbing through for any data analyst.
In a regression framework, differences in means implies we have a continuous dependent variable \(Y\) and a binary independent variable \(X\) (X=0, X=1). We are interested in characterizing the difference in the mean of \(Y\) when \(X=0\) (\(\overline{Y}_{0}\)) and when \(X=0\) (\(\overline{Y}_{1}\)), i.e., \(\overline{Y}_{1}-\overline{Y}_{0}\). This difference in means is captured by the linear regression model of \(Y\) on \(X\)
\(y = a + bx + e\)
where parameter \(b\) captures the difference in \(y\) across levels of – or per 1-unit difference in – variable \(x\).
Statistic | Referred to as | Estimator | Description (and S/M/L thresholds) |
---|---|---|---|
\(b\) | Unstandardized effect size | \(\left(\overline{Y}_1-\overline{Y}_0\right)\) | Unstandardized regression coefficient |
d | Cohen’s d | \(\left(\overline{Y}_{1}-\overline{Y}_{0}\right)\frac{1}{s_p}\) | Mean difference divided by the pooled standard deviation (.2/.5/.8) |
STD, STDY | Mean difference standardized to the total sample standard deviation | \(\left(\overline{Y}_{1}-\overline{Y}_{0}\right)\frac{1}{s_Y}\) | Mean difference divided by the total sample standard deviation (\(s_Y\)). |
\(r_{pb}\) | Point-biseral correlation | \(\left(\overline{Y}_{1}-\overline{Y}_{0}\right)\frac{s_X}{s_{Y}}\) | where \(s_{X}= \sqrt{pq}\), \(p\) being the proportion in group 1 and \(q= 1-p\); this is also the STDYX scaled regression coefficient from Mplus output of a continuous outcome on a dummy variable |
\(r_b\) | Biserial correlation | \(\left(\overline{Y}_{1}-\overline{Y}_{0}\right)\frac{pq/a}{s_Y}\) | where \(a\) is the
height of the standard normal distribution at \(Z(p)\), \(a =
e^{{-p^2}/2}/\sqrt{2\pi}\). In Stata, use
normalden(<p>,0,1) and in R, use
dnorm(<p>,0,1) . See Appendix E |
The most commonly used effect size statistic for a continuous dependent variable and binary independent variable for describing mean differences is Cohen’s d.
There is only one circumstance when it is relatively easy or
straightforward to get a Cohen’s d from Mplus output. This the
bivariable case, where a single \(Y\)
is regressed on a single \(X\). In this
model, the MODEL RESULTS
output regression parameter for
y on x
is the difference in means , and the square root of
the residual variances
result for y
is a good
estimate of the pooled standard deviation. Then, Cohen’s d can
be estimated with y_on_x/sqrt(residual_variances_y)
.
However, this situation is not expected to arise in
real data analysis, because usually we will have multiple explanatory
variables.
A data analyst could compute the pooled variance ahead of doing the
analysis and use those in a MODEL CONSTRAINT
command in
Mplus. I will illustrate using the example data set
dropout.dat
used for the examples by Muthén, Muthén and
Asparouhov in their text Regression and
Mediation Analysis Using Mplus (2016). Hereafter I will refer
to this book as MMA16. The data are described in Appendix B.
# load data from MMA16, as processed. See Appendix B.
dod <- haven::read_dta(here::here("dropout.dta"))
dod["male"] <- as.numeric(dod["gender"]==2)
# Compute Cohen's d for `math7` on `male` using effsize package
# install.packages("effsize")
effsize_d <- effsize::cohen.d(formula = math7 ~ male , data = dod)
## Warning in cohen.d.formula(formula = math7 ~ male, data = dod): Cohercing rhs
## of formula to factor
effsize_d
##
## Cohen's d
##
## d estimate: 0.1315 (negligible)
## 95 percent confidence interval:
## lower upper
## 0.06056 0.20249
# Save some results from observed data to use in Mplus
# the pooled standard deviation
sy <- sd(dod$math7[!is.na(dod$math7)])
s0 <- sd(dod$math7[dod$male==0 & !is.na(dod$math7)])
s1 <- sd(dod$math7[dod$male==1 & !is.na(dod$math7)])
n0 <- length(dod$math7[dod$male==0 & !is.na(dod$math7)])
n1 <- length(dod$math7[dod$male==1 & !is.na(dod$math7)])
sp <- sqrt(((n0-1)*s0^2 + (n1-1)*s1^2) / (n0 + n1 - 2))
# Define a Mplus model regressing math7 on male
model1.model <- paste("math7 on male (b); ","\nMODEL CONSTRAINT: new (d); ", "\nd = b/",sp,";")
cat(model1.model)
## math7 on male (b);
## MODEL CONSTRAINT: new (d);
## d = b/ 10.1785151926642 ;
model1 <- MplusAutomation::mplusModeler(
MplusAutomation::mplusObject(
MODEL = model1.model ,
OUTPUT = "STANDARDIZED;" ,
rdata = dod) ,
"dod.dat" ,
run = 1L)
model1_new <- model1$results$parameters$unstandardized[model1$results$parameters$unstandardized$paramHeader == "New.Additional.Parameters",]
model1_d <- model1_new[model1_new$param == "D",c("est","se")]
model1_d
## est se
## 4 -0.131 0.036
ci95(model1_d[[1]],model1_d[[2]])
## [1] -0.20156 -0.06044
# cat( readLines( "dod.out" ) , sep = "\n" )
Similar examples of using known constants in the
MODEL CONSTRAINT
command are illustrated in MMA16 (see
Table 1.11, page 38).
dod$zspmath7 <- ( dod$math7 - mean(dod$math7[!is.na(dod$math7)]) ) / sp
sd(dod$zspmath7[!is.na(dod$zspmath7)])
## [1] 1.002
model2 <- MplusAutomation::mplusModeler(
MplusAutomation::mplusObject(
MODEL = "zspmath7 on male ;" ,
rdata = dod) ,
"dod.dat" ,
run = 1L)
model2_unstd <- as.data.frame(model2$results$parameters$unstandardized)
model2_unstd
## paramHeader param est se est_se pval
## 1 ZSPMATH7.ON MALE -0.131 0.036 -3.636 0.000
## 2 Means MATH7 50.395 0.184 273.605 0.000
## 3 Intercepts ZSPMATH7 0.068 0.026 2.620 0.009
## 4 Variances MATH7 103.981 2.656 39.148 0.000
## 5 Residual.Variances ZSPMATH7 0.999 0.026 39.143 0.000
model2_d <- model2_unstd[model2_unstd$paramHeader == "ZSPMATH7.ON" & model2_unstd$param == "MALE", c("est","se")]
model2_d
## est se
## 1 -0.131 0.036
ci95(model2_d[[1]],model2_d[[2]])
## [1] -0.20156 -0.06044
# cat( readLines( "dod.out" ) , sep = "\n" )
The limitation of this approach is the data analyst is probably interested in multiple Cohen’s d-scaled effects due to multiple explanatory variables, but can only standardize the variables with respect to one binary background variable before each analysis.
This is the most general and extensible approach. Cohen’s d can also be approximated by converting the point-biserial correlation (\(r_{pb}\)) to \(d\). A conversion that works well is
\(d = \frac{r_{pb}}{\sqrt{1-r_{pb}^2}}\left(\frac{1}{s_X}\right)\)
This expression may be encountered in the literature using 2 instead
of \(1/s_X\); this is because \(1/s_X\) is 2 when \(p=.5\), and this is a common situation
especially in experimental designs. Computation still requires knowing
the proportion of the sample with \(X=1\) and total variance of \(Y\) which are not part of the output
unless the \(X\) variable is
“brought into the model”, and in the example code below I do this by
specifying the mean of x
is to be estimated and labeled
with (p)
.
model3 <- MplusAutomation::mplusModeler(
MplusAutomation::mplusObject(
MODEL = "math7 on male (b);
math7 (resvar);
[male] (p);
model constraint: new (vx vy rpb d);
vx = p*(1-p) ;
vy = b^2*vx + resvar ;
rpb = b*sqrt(vx)/sqrt(vy) ;
d = rpb/(sqrt(1-rpb**2)*(sqrt(vx))) ;" ,
OUTPUT = "STANDARDIZED;" ,
rdata = dod) ,
"dod.dat" ,
run = 1L)
## Warning in i1:i2: numerical expression has 3 elements: only the first used
model3_new <- model3$results$parameters$unstandardized[model3$results$parameters$unstandardized$paramHeader == "New.Additional.Parameters",]
model3_d <- model3_new[model3_new$param == "D",c("est","se")]
model3_d
## est se
## 23 -0.131 0.036
ci95(model3_d[[1]],model3_d[[2]])
## [1] -0.20156 -0.06044
#cat( readLines( "dod.out" ) , sep = "\n" )
It is also worth nothing:
This model produces nonidentification warnings, but standard errors are produced and this is not an issue for the type of model that is being estimated
The “hand calculation” of the total variance of y will be more
complicated with more explanatory variables, and with interactions among
the explanatory variables. But we have seen from example 1 that we can
easily obtain numerical summaries in R and paste them into a Mplus model
statement. This could be done for vx
and vy
to
save coding in Mplus.
A data analyst could compute a Cohen’s d-scaled effect size statistic
outside of Mplus, and take advantage of the fact that Mplus computes the
point-biserial correlation in the STDYX
output:
model4 <- MplusAutomation::mplusModeler(
MplusAutomation::mplusObject(
MODEL = "math7 on male ;" ,
OUTPUT = "STANDARDIZED;" ,
rdata = dod) , "dod.dat" , run = 1L)
model4_stdyx <- as.data.frame(model4$results$parameters$stdyx.standardized)
model4_rpb <- model4_stdyx[model4_stdyx$paramHeader=="MATH7.ON" & model4_stdyx$param=="MALE",c("est","se")]
p <- mean(dod$male)
model4_stdyx_d <- model4_rpb/sqrt(1-model4_rpb^2)*(1/sqrt(p*(1-p)))
model4_stdyx_d
## est se
## 1 -0.1324 0.03604
ci95(model4_stdyx_d[[1]],model4_stdyx_d[[2]])
## [1] -0.20305 -0.06178
STDY
effect instead of computing a
dThe STDY
-scaled regression of a continuous manifest
variable on a binary explanatory variable (MATH7 ON MALE
,
effect \(b/s_Y\)) is a conservative
(i.e., under) estimate of the Cohen’s d (\(b/s_P\); where \(b\) is the unstandardized regression
parameter capturing the mean difference in \(Y\) (MATH7
) across levels of
\(X\) (MALE
), and \(s_Y\) is the total sample standard
deviation and \(s_P\) is the pooled
standard deviation) effect size. Reporting the STDY
scaled
effect and interpreting using thresholds of .2/.5/.8 for small, medium
and large will lead to under-estimating the size of the effect relative
to what would be interpreted if Cohen’s d had been used. But
STDY
scaled effects are not wrong, they are just
summaries of \(b\) standardized to a
different standard deviation than d. It would be wrong to say
that the STDY
effects are Cohen’s d, but one could
say that they are in Cohen’s d family. After all, Cohen
described many different standardizations to the mean difference in his
1988 text (notably none of these involved standardizing to the total
sample standard deviation), and what we now know as “the” Cohen’s
d is only one of those.
model4_stdy <- as.data.frame(model4$results$parameters$stdy.standardized)
model4_stdy_d <- model4_stdy[model4_stdy$paramHeader=="MATH7.ON" & model4_stdy$param=="MALE",c("est","se")]
model4_stdy_d
## est se
## 1 -0.131 0.036
ci95(model4_stdy_d[[1]],model4_stdy_d[[2]])
## [1] -0.20156 -0.06044
effsize_d
##
## Cohen's d
##
## d estimate: 0.1315 (negligible)
## 95 percent confidence interval:
## lower upper
## 0.06056 0.20249
The close approximation of the STDY
standardization of
the MATH7 ON MALE
effect (\(b/s_Y\)) and Cohen’s d (\(b/s_P\)) in this case is that \(b\) is small (close to 0) and the sample
size is very large, and these are the factors that determine the
similarity of \(s_P\) and \(s_Y\). The pooled standard
deviation is a kind of average within-group standard
deviation:
\(s_p = \sqrt{ \frac{(n_0-1)s_0^2 + (n_1-1)s_1^2}{n_0+n_1-2} } = \sqrt{q's_0^2 + p's_1^2}\)
where \(p' = (n_1-1)/(n_0+n_1-2)\) and \(q' = (n_0-1)/(n_0+n_1-2)\); when the sample size is large \(p'\) approaches \(p\). Those within group standard deviations (\(s_0, s_1\)) are average deviations for members groups defined by \(x\) relative to the mean of their own group. The total sample standard deviation is based on the deviation from the total sample mean. It is based on total variablity, and includes both within-group variability and between group variability:
\(s_y = \sqrt{qs_0^2 + ps_1^2 + pqb^2}\)
The point is, the main difference between the pooled variance (\(s^2_p\)) and the total variance (\(s^2_Y\)) is the mean difference squared
times the variance in \(x\): the
between group variance component represented by \(pqb^2\). This is why the STDY
effect will always be smaller than Cohen’s d, unless \(b=0\) in which case they will be equal but
also equally uninteresting.
# compare sp and sY
# need parameter estimate for "b", extract b from last model
model4_unstd <- model4$results$parameters$unstandardized
model4_unstd
## paramHeader param est se est_se pval
## 1 MATH7.ON MALE -1.339 0.368 -3.639 0
## 2 Intercepts MATH7 51.091 0.265 192.640 0
## 3 Residual.Variances MATH7 103.538 2.645 39.146 0
b <- model4_unstd[model4_unstd$paramHeader=="MATH7.ON" & model4_unstd$param=="MALE","est"]
# compute sY as function of within and between components
sy2 <- sqrt((1-p)*s0^2 + p*s1^2 + p*(1-p)*b^2)
# compare to what was obtained earlier
# reported to 4 significant digits because b only has
# 4 significant digits
options(digits=3)
c(s0, s1, sp, sy, sy2)
## [1] 9.64 10.65 10.18 10.20 10.20
Come back later for more. I’ll add discussion of odds ratios and probit regression coefficients.
Statistic | Use | Formula | Description (and S/M/L thresholds) |
---|---|---|---|
h | Difference in proportions | \(h = 2 \text{asin}(p_1^.5) - 2 \text{asin}(p_2^.5)\) | The difference in arcsin transformed proportions. Useful for binary outcomes compared across two groups (.2/.5/.8) |
w | Association of two categorical variables | \(w = \sqrt{\chi^2/N}\) | The same as the \(\phi\) (phi) coefficient for 2×2 tables, but also used for cross-tabulations of higher dimensions. The \(\phi\) coefficient is what you would get if you calculated a Pearson correlation coefficient on two binary variables, treating them as continuous variables (.1/.3/.5) |
Come back later for more. This is
Statistic | Use | Formula | Description (and S/M/L thresholds) |
---|---|---|---|
r | Association of two continuous variables | The Pearson’s correlation coefficient (.1/.3/.5) | |
q | Difference in two correlation coefficients | \(q = \text{atanh}(r_1)-\text{atanh}(r_2)\) | Computed as difference in two Fisher’s z-transformed correlation coefficients (.1/.3/.5) |
Cohen, J. (1969). Statistical power analysis for the behavioral sciences. Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Second Edition. Lawrence Erlbaum Associates.
Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations. Statistics in Medicine, 27(15), 2865.
Muthén, L., & Muthén, B. (1998-2017). Mplus Users Guide (Eighth ed.). Muthén & Muthén.
Muthén, B. O., Muthén, L. K., & Asparouhov, T. (2017). Regression and mediation analysis using Mplus. Muthén & Muthén Los Angeles, CA.
Flavor | Standardized with respect to the … |
---|---|
STDYX | variances of the continuous latent variables, and the variances of the background (or explanatory) variables and the dependent variables. |
STDY | variances of the continuous latent variables as well as the variances of the outcome variables for standardization. |
STD | variances of the continuous latent variables |
This is exerpted from the Mplus Users’ Guide, Chapter 18, which describes OUTPUT command options:
The STANDARDIZED option is used to request standardized parameter estimates and their standard errors and R-square…Three types of standardizations are provided as the default.
The first type of standardization is shown under the heading StdYX in the output. StdYX uses the variances of the continuous latent variables as well as the variances of the background and outcome variables for standardization. The StdYX standardization is the one used in the linear regression of y on x,
bStdYX = b × SD(x)/SD(y),
where b is the unstandardized linear regression coefficient, SD(x) is the sample standard deviation of x, and SD(y) is the model estimated standard deviation of y. The standardized coefficient bStdYX is interpreted as the change in y in y standard deviation units for a standard deviation change in x.
The second type of standardization is shown under the heading StdY in the output. StdY uses the variances of the continuous latent variables as well as the variances of the outcome variables for standardization. The StdY standardization for the linear regression of y on x is
bStdY = b/SD(y).
StdY should be used for binary covariates because a standard deviation change of a binary variable is not meaningful. The standardized coefficient bStdY is interpreted as the change in y in y standard deviation units when x changes from zero to one.
The example data set comes from the online examples and data code on
Statmodel.com and used by Muthén, Muthén and Asparouhov in their text Regression and
Mediation Analysis Using Mplus (2016). The data set
dropout.dat
is used in was downloaded from https://statmodel.com/mplusbook/chapter1.shtml
and munged in Excel and Stata to produce dropout.dta. Get a copy of my
version at https://quantsci.s3.amazonaws.com/BlogPosts/dropout.dta.
Stata data munging code:
insheet using dropout.csv
replace mothed=. if mothed==8
replace fathed=. if fathed==8
replace fathsei=. if fathsei==996
replace fathsei=. if fathsei==998
replace ethnic=. if ethnic==8
replace homeres=. if homeres==98
foreach x of varlist math7-math12 {
replace `x'=. if inlist(`x',996,998)
}
saveold dropout.dta , version(13)
\(s_p = \sqrt{ \frac{(n_0-1) \cdot s_0^2 + (n_1-1) \cdot s_1^2}{n_0+n_1-2} }\)
Statistic | Use | Formula | Description (and S/M/L thresholds) |
---|---|---|---|
\(\Delta\) | Differences in means | \(\Delta = \left(\overline{Y}_{1}-\overline{Y}_{0}\right)\frac{1}{s_c}\) | Mean difference divided by the control group standard deviation; Glass’ delta |
Add Hedge’s g
Show an example of computing the biserial correlation (by hand) and the polyserial correlation (bringing the explanatory variable into the model).