Do factorial designs really provide equal power for interactions as main effects?
Abstract
In factorial designs with effect coding, textbook theory sometimes claims that interaction effects have the same statistical power as main effects when standardized contrasts are equal. But is that really true?
In this post, I simulate a factorial design to show that a condition under which equal power is obtained for the marginal (regression) effects for factors A, B, and their interaction AB. This condition requires a very specific pattern of population values for cell means. In our “corner case,” neither A nor B produce any effect on their own, and all of the effect appears only when both treatments are administered together. We explain the theory behind effect coding, demonstrate our simulation code and results, and conclude that while equal power is possible in theory, it is a rare and contrived situation in practice.
The implication is: if you are designing a factorial experiment and hoping to find interactions among treatment – choosing the factorial design because you once read that factorial designs analyzed with effect coding have the same power to detect interaction effects as main effects—understand that the ”same power” result implies ”same effect” for interactions and main effects. This situation is not likely to be observed in nature and, in fact, implies that the independent effects of treatments are null. A more likely situation is that interaction effects are much smaller than main effects.
Therefore, designing a study to support detecting interaction effects requires careful thought about what those interaction effect magnitudes are likely to be and planning a study with sufficient power to detect them. In such cases, the sample size needed for adequate power to detect interactions may be substantially larger, even if it is super-adequate for detecting main effects.
1 Introduction
Factorial designs are celebrated for their efficiency and the ability to estimate multiple effects simultaneously. With effect coding (using –1 and +1), main effects and interaction effects are estimated from orthogonal contrasts. Textbooks sometimes state that if the standardized (contrast) effect sizes for the main effects and the interaction are equal, then the power to detect them will be equal in a balanced design. But does this imply that the interaction is always as detectable as the main effects?
In our investigation, we simulated a factorial design with treatments A and B. We set up the cell means so that the marginal (regression) coefficients for A, B, and the interaction AB would all be equal. As it turns out, achieving equal coefficients requires that the individual treatments (A or B) have no effect when given alone. Instead, the entire effect is concentrated in the joint condition (A = +1, B = +1). This post explains the implications of this design choice and demonstrates that while the theoretical equality in power holds for this special case, it is not typical in most experimental situations.
2 The Simulation Setup and Theory
When using effect coding (“-1” for control, “+1” for treatment), the regression model for a factorial design is:
Here, the coefficient for a main effect is given by
which given the four cells of the balanced design is also
The interaction effect is subtly different:
And if the independent effects of A and B are null, and the joint effect A and B is non-null, the marginal effects for A and B and their interaction will be equal.
In our “corner-case” simulation, we engineered the cell means as follows:
-
•
-
•
-
•
-
•
This arrangement yields marginal effects:
-
•
For A, the difference between the average of the cells and the cells is , so .
-
•
The same holds for B.
-
•
The interaction difference (the difference–in–differences) is , so .
Notice that in this setup neither A nor B produces any effect in isolation (i.e., if only treatment A were offered with B held constant at –1, you would see no difference compared to the baseline). The entire effect appears only when both treatments are delivered together. Under these conditions, the marginal regression effects (and hence power to detect them) are equal. But this is a contrived scenario—a “corner case” that rarely occurs in practice.
The Appendix contains Stata and R code that demonstrates this equal power in such a corner case.
3 Discussion
In typical experiments, treatments tend to have some independent effect. Thus, if you were to run a single-treatment experiment using only treatment A (with B absent or fixed at –1), you would not see the 0.5 effect that you observe in the factorial experiment. Instead, the independent effect would be zero, with the observed effect in the factorial design coming entirely from the interaction when both A and B are administered.
This means that while in a perfectly balanced factorial design the power to detect an interaction might be as high as for the main effects when the standardized coefficients are equal, such a condition requires that neither treatment has an effect on its own. In most real-world settings, main effects are easier to detect, and interactions are underpowered in comparison.
Critics such as Stephen Senn have noted that the “equal power” result is more a mathematical artifact of the coding scheme and balance assumptions than a practical reality. In everyday experimental practice, the underlying cell frequencies and effect magnitudes tend to differ, and interactions are notoriously difficult to detect.
4 Conclusion
Our simulation shows that equal standardized (regression) effects for A, B, and AB—and hence equal power—are achievable only in a very specific situation: when the individual treatments have no effect in isolation and only yield a substantial effect when combined. In practice, this “corner case” is unlikely to occur. As such, while the textbook result holds under ideal conditions, experimenters should be cautious when interpreting power calculations for interactions in real-world factorial designs.
Appendix: Simulation Code and Results
Simulation code provided in Stata and R.
Stata code
* Factorial design power with significant interaction cap program drop myboot program define myboot, rclass drop _all set obs 4 gen A = -1 gen B = -1 replace A = 1 in 2 replace A = 1 in 4 replace B = 1 in 3/4 gen k = 9 // N/4 per cell expand k // Generate an error term; mean 0, variance 1 gen y = invnorm(uniform()) // Add a constant shift to all observations so that baseline // cell becomes 0.5 replace y = y + 0.5 // Now add the extra shifts: for cell A = +1, B = +1 add an // additional 2 to bring it to 2.5. replace y = y + 2 if A == 1 & B == 1 // Optionally, view the cell summaries: // table A B, c(n y mean y) gen AB = A * B reg y A B AB return scalar EyA0B0 = _b[_cons] + _b[A]*(-1) + _b[B]*(-1) + _b[AB]*(-1)*(-1) return scalar EyA1B0 = _b[_cons] + _b[A]*(+1) + _b[B]*(-1) + _b[AB]*(+1)*(-1) return scalar EyA0B1 = _b[_cons] + _b[A]*(-1) + _b[B]*(+1) + _b[AB]*(-1)*(+1) return scalar EyA1B1 = _b[_cons] + _b[A]*(+1) + _b[B]*(+1) + _b[AB]*(+1)*(+1) return scalar marginA = _b[A] return scalar marginB = _b[B] return scalar marginAB = _b[AB] test A return scalar PA = ‘r(p)’ test B return scalar PB = ‘r(p)’ test AB return scalar PAB = ‘r(p)’ end set seed 3481 simulate EyA0B0=r(EyA0B0) EyA0B1=r(EyA0B1) /// EyA1B0=r(EyA1B0) EyA1B1=r(EyA1B1) /// PA=r(PA) PB=r(PB) PAB=r(PAB) /// marginA=r(marginA) marginB=r(marginB) /// marginAB=r(marginAB), reps(10001) nodots: myboot gen hitA = PA < .05 gen hitB = PB < .05 gen hitAB = PAB < .05 su
R code
# Set the seed for reproducibility set.seed(3481) # Function to run one replication of the simulation simulate_run <- function() { # Number of observations per cell n_per_cell <- 9 # Create a data frame with 4 cells. # We want the order: # Row 1: A = -1, B = -1 # Row 2: A = -1, B = +1 # Row 3: A = +1, B = -1 # Row 4: A = +1, B = +1 df <- data.frame( A = rep(c(-1, -1, 1, 1), each = n_per_cell), B = rep(c(-1, 1, -1, 1), each = n_per_cell) ) # Generate the error term ~ N(0,1) df$y <- rnorm(nrow(df)) # Add a constant shift so that the baseline cell (A = -1, B = -1) has mean ~0.5. df$y <- df$y + 0.5 # For the cell with A = +1 and B = +1, add an extra 2 to bring its mean to ~2.5. df$y[df$A == 1 & df$B == 1] <- df$y[df$A == 1 & df$B == 1] + 2 # Create the interaction term df$AB <- df$A * df$B # Fit the linear model mod <- lm(y ~ A + B + AB, data = df) # Extract coefficients and standard summary statistics mod_summary <- coef(summary(mod)) # Get regression coefficients (marginal effects) marginA <- mod_summary["A", "Estimate"] marginB <- mod_summary["B", "Estimate"] marginAB <- mod_summary["AB", "Estimate"] # Get p-values pA <- mod_summary["A", "Pr(>|t|)"] pB <- mod_summary["B", "Pr(>|t|)"] pAB <- mod_summary["AB", "Pr(>|t|)"] # Compute predicted cell means using effect coding: # For cell (A=-1, B=-1): pred = intercept + A*(-1) + B*(-1) + AB*((-1)*(-1)) # For cell (A=-1, B=+1): pred = intercept + A*(-1) + B*(+1) + AB*(-1*(+1)) # For cell (A=+1, B=-1): pred = intercept + A*(+1) + B*(-1) + AB*((+1)*(-1)) # For cell (A=+1, B=+1): pred = intercept + A*(+1) + B*(+1) + AB*((+1)*(+1)) coefs <- coef(mod) EyA0B0 <- coefs["(Intercept)"] + coefs["A"]*(-1) + coefs["B"]*(-1) + coefs["AB"]*(1) EyA0B1 <- coefs["(Intercept)"] + coefs["A"]*(-1) + coefs["B"]*(+1) + coefs["AB"]*(-1) EyA1B0 <- coefs["(Intercept)"] + coefs["A"]*(+1) + coefs["B"]*(-1) + coefs["AB"]*(-1) EyA1B1 <- coefs["(Intercept)"] + coefs["A"]*(+1) + coefs["B"]*(+1) + coefs["AB"]*(+1) # Return the key statistics as a named vector return(c( EyA0B0 = EyA0B0, EyA0B1 = EyA0B1, EyA1B0 = EyA1B0, EyA1B1 = EyA1B1, marginA = marginA, marginB = marginB, marginAB = marginAB, pA = pA, pB = pB, pAB = pAB )) } # Number of replications nrep <- 10001 # Run the simulation and store results in a matrix sim_results <- replicate(nrep, simulate_run()) sim_results <- t(sim_results) # transpose to get rows as replications sim_df <- as.data.frame(sim_results) # Summarize the simulation results: mean of cell means and marginal effects summary_stats <- sapply(sim_df, mean) print("Average values over replications:") print(summary_stats) # Calculate "hit" rates: proportion of replications with p-value < 0.05 for each effect hitA <- mean(sim_df$pA < 0.05) hitB <- mean(sim_df$pB < 0.05) hitAB <- mean(sim_df$pAB < 0.05) hit_rates <- c(hitA = hitA, hitB = hitB, hitAB = hitAB) print("Hit rates (power) for p < 0.05:") print(hit_rates)
Key Results (averaged over 10,001 replications):
-
•
Cell Means:
-
–
EyA0B0 0.50
-
–
EyA0B1 0.50
-
–
EyA1B0 0.50
-
–
EyA1B1 2.50
-
–
-
•
Regression Coefficients (Marginal Effects):
-
–
-
–
-
–
-
–
-
•
Detection (Hit Rates at ):
-
–
Power for A 83%
-
–
Power for B 83%
-
–
Power for AB 82%
-
–
These results confirm that, under these contrived conditions, the marginal effects (and hence the statistical power) are equal for A, B, and AB.