Talking about change: Measuring change in longitudinal studies

A commonly encountered problem in longitudinal studies is that there are multiple ways that one can measure change. In a 2-wave study, one can either regress the change in $y$ on some variable, or regress $y_t$ on the variable while including $y_{t-1}$ as a covariate. For the purpose of this article, I’ll refer to the first method as the change score method, and the latter as the lagged dependent variable (LDV) method. (Another option is percent change, but as far as I’m aware, there is no advantage of this method to LDV or simple change scores, and you lose power.)

Both of these approaches have their advantages and disadvantages, but it seems to me that some people argue for general superiority of one method. I find that this helps nobody, and that neither method is adequate.

Note: The focus of this post will be on observational studies, not randomized trials.

Change scores as a fixed effects model

A lot of people get hung up on things such as the power or reliabiity of the methods. However, this is irrelevant if one of the methods doesn’t produce valid results in the first place. Hence, I’ll focus on the validity of each method here.

Generally, the use of the regression method arises from the obvious model that it implies – a standard autoregressive model. Using standardized variables, and representing the LDV coefficients with $^L$ we have:

$$ y_{i,t} = \beta_1 y_{i,{t-1}} + \beta^L_2 x_i + \epsilon^L_{i,t} $$

We can obviously get the change score method by simply fixing $\beta_1$ to 1, and moving it to the left-hand side. This leads us to a common criticism of change scores: Change scores are simply the LDV approach with $\beta^L_2$ fixed to 1.

However, it’s not that simple. There are other models that imply the change score method where the LDV method would lead to bias.

Suppose we instead assume a random intercept model instead. Using $^C$ to represent the change-score version of a coefficient:

$$ y_{i,{t-1}} = \mu_i + \epsilon^C_{i,{t-1}} $$ $$ y_{i,t} = \mu_i + \beta_2^C x_i + \epsilon^C_{i,t} $$

If we want to remove the effect of $\mu_i$ (which confounds the relationship with $x_i$), an obvious way to do so is to take differences:

$$ y_{i,t} - y_{i,{t-1}} = \mu_i - \mu_i + \beta_2^C x_i + \epsilon^C_{i_t} - \epsilon^C_{i,{t-1}} $$ $$ \Delta y_{i,t} = \beta_2^C x_i + \Delta\epsilon_{i,t}$$

Notice that the effect of $\mu_i$ is removed. Our null hypothesis, $\beta_2^C x_i = 0$, allows $\mu_i$ to vary. If we can assume that $\mu_i = 0$, as in randomized trials, then there’s no point to this. However, in observational studies, we typically can’t assume this. For instance, suppose that $y_i$ is a measure of aggression for child $i$. In this case, we $\mu_i$ is the stable component of aggression – the child’s overall tendancy to be aggressive. If we had differenced $x$ as well, the model would include $x_{i,t-1}$ in the first equation (note that $x$'s time scale may be lagged, such that $x_{i,t-1}$ is a the same time as $y_{i,t}$, but this typically does not occur as it requires time points for $x$ before $y$ was measured). Using this method, we clearly get an unbiased estimate of $\beta_2^C$.

An alternative option would be to center each $y_i$ value around child $i$'s mean. This is the standard approach to fixed-effects regression. However, this yields the same results as the change score method! The two methods are simply just different methods of estimating a fixed-effects model.

The difference between this approach and the LDV approach goes further than the value of $\beta_1$, but has to do with if the value of $y_{t-1}$ is allowed to be correlated with the error term or not. In the LDV model, it is necassarily correlated with the error term, but the autoregressive model assumes that they are uncorrelated. Or, more simply, the LDV method controls for $\mu_i$ while the autoregressive method does not.

For more details on this interpretation, see (Allison, 1990).

The return of autoregression, and the death of two-wave data

Should we always use change scores then? Well, what if the LDV model is true?

In this case, the change score method will attribute all change in $y$ to an effect of $x$. In contrast, the LDV method only attributes change that differs from the expected level, $\beta_1^L y_{i,t-1}$. Hence, if there is regression towards the mean, the change score method will be biased.

Each method controls for stability in a different way. The change-score method controls for trait-like stability, while the LDV approach controls for simple autoregression. However, in most psychological studies, we want to control for both of these stabilities (cf. Curran et al., 2012, Hamaker et al., 2015, Berry & Willoughby, 2016).

So, why not do that? Well, if you have 3 waves of data or more, you can do so rather straightforwardly in a structural equation model (again, Curran et al., 2012, Hamaker et al., 2015, Berry & Willoughby, 2016). In fact, both of these models are nested under a random intercept autoregressive model:

However, with only two waves of data, such a model isn’t identified, as the two types of stability can’t be differentiated. Hence, we arrive at our conclusion:

Stop arguing that over which method is right, they’re probably both wrong.

If you don’t have randomization, then you can’t assume that $\mathrm{var}\left(\mu\right) = 0$ (or at least $\mathrm{cov}\left(\mu,x\right) = 0$; if you do have randomization, then use the LDV method). However, you also can’t assume that $\beta_1 = 1$. What should you do? I suggest simply estimating both models. If you get differing results, interpret them conditionally on each models’ assumptions. If you get results that don’t substantiatively differ from one another, then you can give a stronger interpretation. If $\beta_2$ turns out to be close to 1, then give a stronger focus to the change score results. Similarly, if $\mathrm{var}\left(\mu\right) \approx 0$ (testable in a structural equation model), then you can give a stronger focus to the LDV approach.