The American Association for Public Opinion Research (AAPOR) defines response rate as the proportion of those people approached who are eligible for the survey (or study) who participate in the study.

\[ \text{response rate} = \frac{a}{a + b + e \cdot c} \]

or under the common assumption that \(e = (a+b)/(a+b+d)\)

\[\text{response rate} = \frac{a \left(a + b + d\right)}{\left(a + b\right) \left(a + b + c + d\right)}\] where

\(a\) is the number of complete interviews
\(b\) is the number of eligible but not complete
\(c\) is the number with unknown eligibility
\(d\) is the number ineligible
\(e\) is the expected proportion of those unknown eligibility who would be eligible
rr = a*(a + b + d)/((a + b)*(a + b + c + d))

Or viewed as a CONSORT or STROBE study flow diagram:

\(e\) cannot be known by definition

\(e\) is the expected proportion of those who have unknown eligibility who would have been eligible if eligibility determination could have been completed. This quantity can never be known, unless it is known to be 0 (everyone who does not participate is not eligible) or 1 (everyone approached is eligible). When \(e=1\), AAPOR identifies the response rate as “RR2”, when \(e=0\) we have “RR6”, and when \(e \in (0,1)\) we have “RR4”. AAPOR does have RR1,RR3, and RR5 but these flavors distinguish between “complete” interviews and “partial” interviews, and in this note I am not interested in making that distinction.

Crude guess for \(e\)

If you have no information on the people who are of unknown eligibility, or you are in a hurry, use the following for \(e\)

\[\hat{e} = \frac{a+b}{a+b+d} \]

which is the proportion of those eligible among those with known eligibility. For a little more on this and an alternative expression for \(e\), but which turns out to be equivalent to the above, see the Appendix, below.

Best practice

A really good strategy would be to use your data and generate a model of the probability of being eligible given known eligibility and important participant factors (e.g., age, sex, race/ethnicity, other socioeconomic variables, geography, clinical factors) and use the mean of that predicted probability among those with unknown eligibility for \(e\). Once this is done one can insert a predicted value for \(e\) (i.e., the mean of the predicted probability of being eligible among those whom you don’t know eligibility, given a model of eligibility based on observable data collected when eligibility was assessed or based on information available in forming the sampling frame).

References

The American Association for Public Opinion Research, Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys. 5th Edition. 2008, Lenexa, KS: AAPOR.

Rich Jones Brown University

July 28, 2022

Image is SmartDraw drawing on GDrive

Appendix: SymPy and Alternative \(e\)

Response rate (AAPOR)

The response rate (\(RR\)) for a survey, from the American Association of Public Opinion Research, is

\[ RR = \frac{a}{a + b + e \cdot c} \]

from sympy import *
a,b,c,d,e,f,r = symbols('a,b,c,d,e,f,r')

where \[ \begin{aligned} a & = \text{is the number of complete interviews} \\ b & = \text{is the number of eligible but not complete} \\ c & = \text{is the number with unknown eligibility} \\ d & = \text{is the number ineligible} \\ e & = \text{is the expected proportion eligible of those with unknown eligibility} \end{aligned} \]

\(e\) is not defined by the AAPOR. The best estimate for \(e\) is one based on a statistical model, where among those with known eligibility we model the probability of being eligible as a function of what limited information we have available for those approached (it might be address, maybe sex and age information, or it might be more detailed information depending on how the sampling frame was constructed). This can be a complicated process if eligibility is determined in multiple stages. But that’s the best estimate for \(e\).

Often, an initial value for \(e\) is the number of eligible over those with known eligibility. I will call this f

f = (a+b)/(a+b+d)
f

\(\displaystyle \frac{a + b}{a + b + d}\)

It might seem like a very strong assumption to use the proportion eligible among those with known eligibility, as an estimate of the proportion eligible among those with unknown eligibility. It is probably the case that people who refuse to comply with eligibility determination are considerably different than those who do comply with eligibility determination.

It might be less strong of an assumption to estimate the proportion eligible among those with unknown eligibility using a denominator that is all persons approached \((a+b+c+d)\) and a numerator that is those people with known or assumed eligible \((a+b+f \cdot c)\), or

e = (a+b+f*c)/(a+b+d+c)
e

\(\displaystyle \frac{a + b + \frac{c \left(a + b\right)}{a + b + d}}{a + b + c + d}\)

However, the above expression for \(e\) simplifies to the expression we previously defined for \(f\).

simplify(e)

\(\displaystyle \frac{a + b}{a + b + d}\)

Therefore, it is efficient to simply use

e = (a+b)/(a+b+d)
e

\(\displaystyle \frac{a + b}{a + b + d}\)

This makes the response rate

r = a/(a+b+e*c)
r

\(\displaystyle \frac{a}{a + b + \frac{c \left(a + b\right)}{a + b + d}}\)

simplify(r)

\(\displaystyle \frac{a \left(a + b + d\right)}{\left(a + b\right) \left(a + b + c + d\right)}\)

print(simplify(r))
a*(a + b + d)/((a + b)*(a + b + c + d))