Skip to contents

The model we are interested in is

yt=β0+β1t+uty_t = \beta_0 + \beta_1 t + u_t where tt refers to time and utu_t is either

  1. ut=ϕut1+etu_t = \phi u_{t-1} + e_t                    an AR(1) process
  2. ut=ϕ1ut1+ϕ2ut2+etu_t = \phi_1 u_{t-1} + \phi_2 u_{t-2} + e_t         an AR(2) process

with etN(0,σ2)e_t \sim N(0,\sigma^2)

Likelihood for AR(p) errors

From Hamilton (1994)

logL(β_,ϕ_,σ;y_)=n2log(2π)n2log(σ2)+12log|Vp1|12σ2(yp_μp_)TVp1(yp_μp_)12σ2t=p+1n(ytcϕ1yt1...ϕpytp)2\begin{align} logL\left( \underline{\beta}, \underline{\phi},\sigma; \underline{y} \right ) &= -\frac{n}{2}log(2\pi) -\frac{n}{2}log(\sigma^2) +\frac{1}{2}log \left|V_p^{-1} \right| \\ &-\frac{1 }{2 \sigma^2} (\underline{y_p}-\underline{\mu_p})^T V_p^{-1}(\underline{y_p}-\underline{\mu_p}) \\ &- \frac{1}{2\sigma^2}\sum^n_{t=p+1} (y_t - c - \phi_1y_{t-1} - ... - \phi_p y_{t-p})^2 \\ \end{align}

where

|Vp1|\left|V_p^{-1} \right| is determinant of inverted matrix VpV_p,

σ2Vp\sigma^2V_p = variance-covariance matrix of order p,

μp_=Xpβ_\underline{\mu_p} = X_p\underline{\beta}, and

XpX_p is the pthp_{th} row of the design matrix corresponding to time t = p

cc = function of fitted terms Xtβ_X_t\underline{\beta}

Likelihood for AR(1) errors

Setting p = 1,

σ2V1=σ21ϕ12\sigma^2V_1 = \frac{\sigma^2}{1-\phi_1^2} = variance of the process, y,

|V11|=1ϕ12\left|V_1^{-1}\right| = 1-\phi_1^2

μ1=X1β_=β0+β1\mu_1 = X_1\underline{\beta} = \beta_0 + \beta_1

c=Xtβ_ϕ1Xt1β_c = X_t\underline{\beta} - \phi_1 X_{t-1}\underline{\beta}

y1y_1 is the first observation

The log likelihood,

logL(β_,ϕ1,σ;y_)=n2log(2π)n2log(σ2)+12log(1ϕ12)12σ2(y1X1β_)2(1ϕ12)12σ2t=2n(ytXtβ_ϕ1(yt1Xt1β_))2\begin{align} logL\left( \underline{\beta}, \phi_1, \sigma; \underline{y} \right ) &= -\frac{n}{2}log(2\pi) -\frac{n}{2}log(\sigma^2) +\frac{1}{2}log(1-\phi_1^2) \\ &-\frac{1 }{2 \sigma^2} ({y_1}-X_1\underline{\beta})^2 (1-\phi_1^2) \\ &- \frac{1}{2\sigma^2}\sum^n_{t=2} (y_t - X_t\underline{\beta} - \phi_1 (y_{t-1} -X_{t-1}\underline{\beta}))^2 \\ \end{align}

Differentiating logL(β_,ϕ1,σ;y_)logL\left( \underline{\beta}, \phi_1, \sigma; \underline{y} \right ) with respect to σ\sigma and equating to zero yields the maximum likelihood estimator,

σ̂2=1n[(y1X1β_)2(1ϕ12)+t=2n(ytXtβ_ϕ1(yt1Xt1β_))2]\begin{align} \hat{\sigma}^2 = \frac{1}{n}\left[ (y_1-X_1\underline{\beta})^2(1-\phi_1^2) + \sum^n_{t=2} (y_t - X_t\underline{\beta} - \phi_1 (y_{t-1} -X_{t-1}\underline{\beta}))^2 \right] \end{align}

Substituting σ̂2\hat{\sigma}^2 back into the log likelihood yields (Beach and MacKinnon (1978a))

logL(β_,ϕ1;y_)=const.+12log(1ϕ2)n2log((y1X1β_)2(1ϕ12)+t=2n(ytXtβ_ϕ1(yt1Xt1β_))2)\begin{align} logL\left( \underline{\beta}, \phi_1; \underline{y} \right ) &= const. +\frac{1}{2}log(1-\phi^2) \\ &-\frac{n}{2}log\left( (y_1-X_1\underline{\beta})^2(1-\phi_1^2) + \sum^n_{t=2} (y_t - X_t\underline{\beta} - \phi_1 (y_{t-1} -X_{t-1}\underline{\beta}))^2 \right) \\ \end{align}

Two additional terms exist in the likelihood that do not appear in the conditional likelihood used in GLS procedures. The term (y1X1β_)2(1ϕ12)(y_1-X_1\underline{\beta})^2(1-\phi_1^2) ensures that the initial value has an effect on the estimates and 12log(1ϕ2)\frac{1}{2}log(1-\phi^2) constrains the stationary condition to hold.

Maximization of the likelihood

Maximization of the likelihood requires iterative or numerical procedures.

The estimate of β_\underline{\beta} which maximizes the log-likelihood conditional on ϕ\phi is

β̂_=(X*TX*)1X*Ty*\begin{align} \underline{\hat{\beta}} = (X^{*T}X^*)^{-1}X^{*T}y^* \end{align}

where X*=QXX^*=QX and y*=Qyy^* = Qy where QQ is the Prais and Winsten transformation

Q=[(1ϕ12)1/20...ϕ110......0ϕ11] \begin{matrix} Q = \end{matrix} \begin{bmatrix} (1-\phi_1^2)^{1/2} & 0 & ... \\ -\phi_1 & 1 & 0 & ... \\ & & ... & 0 & -\phi_1 & 1\\ \end{bmatrix} So a procedure which searches for ϕ1\phi_1 is all that is required to find maximum likelihood estimates of β_\underline\beta, σ\sigma, and ϕ1\phi_1

Likelihood for AR(2) errors

Following the same method as for AR(1) errors.

Set p = 2,

V21=[(1ϕ22)(ϕ1+ϕ1ϕ2)(ϕ1+ϕ1ϕ2)(1ϕ22)] \begin{matrix} V_2^{-1} = \end{matrix} \begin{bmatrix} (1-\phi_2^2) & -(\phi_1 + \phi_1\phi_2) \\ -(\phi_1 + \phi_1\phi_2) & (1-\phi_2^2) \\ \end{bmatrix}

|V21|=(1+ϕ22)[(1ϕ2)2ϕ12]\left|V_2^{-1}\right| = (1+\phi_2^2)\left[(1-\phi_2)^2 -\phi_1^2 \right]

μ2_=(μ1,μ2)=(X1β_,X2β_)\underline{\mu_2} = (\mu_1, \mu_2) = (X_1\underline{\beta},X_2\underline{\beta}) is the vector of means for t = 1,2 and y2_=(y1,y2)\underline{y_2} = (y_1,y_2) is the corresponding vector of observations

[μ1μ2]=[1112][β0β1] \begin{bmatrix} \mu_1 \\ \mu_2 \\ \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 2 \\ \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \end{bmatrix}

c=Xtβ_ϕ1Xt1β_ϕ2Xt2β_c = X_t\underline{\beta} - \phi_1 X_{t-1}\underline{\beta}- \phi_2 X_{t-2}\underline{\beta}

The log likelihood,

logL(β_,ϕ_,σ;y_)=n2log(2π)n2log(σ2)+12log((1+ϕ22)[(1ϕ2)2ϕ12])12σ2(y2_μ2_)TV21(y2_μ2_)12σ2t=3n(ytXtβ_ϕ1(yt1Xt1β_)ϕ2(yt2Xt2β_))2\begin{align} logL\left( \underline{\beta}, \underline{\phi}, \sigma; \underline{y} \right ) &= -\frac{n}{2}log(2\pi) -\frac{n}{2}log(\sigma^2) +\frac{1}{2}log((1+\phi_2^2)\left[(1-\phi_2)^2 -\phi_1^2 \right]) \\ &-\frac{1 }{2 \sigma^2} (\underline{y_2}-\underline{\mu_2})^T V_2^{-1}(\underline{y_2}-\underline{\mu_2}) \\ &- \frac{1}{2\sigma^2}\sum^n_{t=3} (y_t - X_t\underline{\beta} - \phi_1 (y_{t-1} -X_{t-1}\underline{\beta}) - \phi_2 (y_{t-2} -X_{t-2}\underline{\beta}))^2 \\ \end{align}

Differentiating logL(β_,ϕ_,σ;y_)logL\left( \underline{\beta}, \underline{\phi}, \sigma; \underline{y} \right ) with respect to σ\sigma and equating to zero yields the maximum likelihood estimator,

σ̂2=1n[(y2_μ2_)TV21(y2_μ2_)+t=3n(ytXtβ_ϕ1(yt1Xt1β_)ϕ2(yt2Xt2β_))2]\begin{align} \hat{\sigma}^2 = \frac{1}{n}\left[ (\underline{y_2}-\underline{\mu_2})^T V_2^{-1}(\underline{y_2}-\underline{\mu_2}) + \sum^n_{t=3} (y_t - X_t\underline{\beta} - \phi_1 (y_{t-1} -X_{t-1}\underline{\beta}) - \phi_2 (y_{t-2} -X_{t-2}\underline{\beta}))^2\right] \end{align}

Substituting σ̂2\hat{\sigma}^2 back into the log likelihood yields

logL(β_,ϕ_;y_)=const.+12log((1+ϕ22)[(1ϕ2)2ϕ12])n2log[(y2_μ2_)TV21(y2_μ2_)+t=3n(ytXtβ_ϕ1(yt1Xt1β_)ϕ2(yt2Xt2β_))2]\begin{align} logL\left( \underline{\beta}, \underline{\phi}; \underline{y} \right) &= const. +\frac{1}{2}log\left((1+\phi_2^2)\left[(1-\phi_2)^2 -\phi_1^2 \right]\right) \\ &-\frac{n}{2}log\left[ (\underline{y_2}-\underline{\mu_2})^T V_2^{-1}(\underline{y_2}-\underline{\mu_2}) + \sum^n_{t=3} (y_t - X_t\underline{\beta} - \phi_1 (y_{t-1} -X_{t-1}\underline{\beta}) - \phi_2 (y_{t-2} -X_{t-2}\underline{\beta}))^2\right]\\ \end{align}

Simplifying further,

12log((1+ϕ22)[(1ϕ2)2ϕ12])=log(1+ϕ2)+12log(1ϕ1ϕ2)+12log(1+ϕ1ϕ2)\begin{align} \frac{1}{2}log\left((1+\phi_2^2)\left[(1-\phi_2)^2 -\phi_1^2 \right]\right) = log(1+\phi_2) + \frac{1}{2}log(1-\phi_1-\phi_2)+\frac{1}{2}log(1+\phi_1-\phi_2) \end{align}

and

(y2_μ2_)TV21(y2_μ2_)=[(y1μ1)(y2μ2)][(1ϕ22)(ϕ1+ϕ1ϕ2)(ϕ1+ϕ1ϕ2)(1ϕ22)][(y1μ1)(y2μ2)]\begin{align} (\underline{y_2}-\underline{\mu_2})^T V_2^{-1}(\underline{y_2}-\underline{\mu_2}) = & \begin{bmatrix} (y_1 - \mu_1) & (y_2 -\mu_2) \\ \end{bmatrix} \begin{bmatrix} (1-\phi_2^2) & -(\phi_1 + \phi_1\phi_2) \\ -(\phi_1 + \phi_1\phi_2) & (1-\phi_2^2) \\ \end{bmatrix} \begin{bmatrix} (y_1 - \mu_1) \\ (y_2 - \mu_2) \\ \end{bmatrix} \end{align}

=(y1μ1)2(1ϕ22)2(y1μ1)(y2μ2)(ϕ1+ϕ1ϕ2)+(y2μ2)2(1ϕ22)=(y1μ1)2(1ϕ22)2(y1μ1)(y2μ2)ϕ1(1+ϕ2)+(y2μ2)2(1ϕ22)=(y1X1β_)2(1ϕ22)2(y1X1β_)(y2X2β_)ϕ1(1+ϕ2)+(y2X2β_)2(1ϕ22)\begin{align} = & (y_1 - \mu_1)^2(1-\phi_2^2)-2(y_1 - \mu_1)(y_2 - \mu_2)(\phi_1 + \phi_1\phi_2)+(y_2 - \mu_2)^2(1-\phi_2^2)\\ = & (y_1 - \mu_1)^2(1-\phi_2^2)-2(y_1 - \mu_1)(y_2 - \mu_2)\phi_1(1 + \phi_2)+(y_2 - \mu_2)^2(1-\phi_2^2) \\ = & (y_1 - X_1\underline{\beta})^2(1-\phi_2^2)-2(y_1-X_1\underline{\beta})(y_2-X_2\underline{\beta})\phi_1(1 + \phi_2)+(y_2 -X_2\underline{\beta})^2(1-\phi_2^2) \end{align}

The likelihood now has the form of that found in Beach and MacKinnon (1978b)

$$ \begin{align} logL\left( \underline{\beta}, \underline{\phi}; \underline{y} \right) &= const. +log(1+\phi_2) + \frac{1}{2}log(1-\phi_1-\phi_2)+\frac{1}{2}log(1+\phi_1-\phi_2) \\ &-\frac{n}{2}log\left[ (y_1 - X_1\underline{\beta})^2(1-\phi_2^2)-2(y_1-X_1\underline{\beta})(y_2-X_2\underline{\beta})\phi_1(1 + \phi_2)+(y_2 -X_2\underline{\beta})^2(1-\phi_2^2) \\ + \sum^n_{t=3} (y_t - X_t\underline{\beta} - \phi_1 (y_{t-1} -X_{t-1}\underline{\beta}) - \phi_2 (y_{t-2} -X_{t-2}\underline{\beta}))^2\right]\\ \end{align} $$

Maximization of the likelihood

As in the case for AR(1) model maximizing the likelihood requires numerical methods.

The estimate of β_\underline{\beta} which maximizes the log-likelihood conditional on ϕ\phi (Beach and MacKinnon (1978b)) is

β̂_=(X*TX*)1X*Ty*\begin{align} \underline{\hat{\beta}} = (X^{*T}X^*)^{-1}X^{*T}y^* \end{align}

where X*=QXX^*=QX and y*=Qyy^* = Qy where QQ is the Prais and Winsten transformation

Q=[[(1ϕ22)ϕ12(1+ϕ2)/(1ϕ2)]1/20...[ϕ12(1+ϕ2)/(1ϕ2)]1/2(1ϕ22)1/20...ϕ2ϕ110...0ϕ2ϕ110......0...0ϕ2ϕ11] \begin{matrix} Q = \end{matrix} \begin{bmatrix} \left[(1-\phi_2^2)-\phi_1^2(1+\phi_2)/(1-\phi_2)\right]^{1/2} & 0 & ... \\ \left[\phi_1^2(1+\phi_2)/(1-\phi_2)\right]^{1/2} & (1-\phi_2^2)^{1/2} & 0 & ... \\ -\phi_2 & -\phi_1 & 1 & 0 & ...\\ 0 & -\phi_2 & -\phi_1 & 1 & 0 & ...\\ ... \\ 0 & ... &0 & -\phi_2 & -\phi_1 & 1\\ \end{bmatrix}

So a procedure which searches for ϕ1\phi_1 & ϕ2\phi_2 is all that is required to find maximum likelihood estimates of β_\underline\beta, σ\sigma, and ϕ_\underline{\phi}

Likelihood from first principles

AR(1)

L(θ_;y_)=t=2np(Yt=ytYt1=yt1)×p(Y1=y1)L(\underline{\theta};\underline{y})=\prod_{t=2}^n p(Y_t = y_t │ Y_{t-1}=y_{t-1}) × p(Y_1=y_1)

where θ_=(β0,β1,ϕ,σ2)\underline{\theta} = (\beta_0,\beta_1,\phi,\sigma^2), p(Yt=ytYt1=yt1)p(Y_t = y_t │ Y_{t-1}=y_{t-1}) is the conditional distribution of yty_t given yt1y_{t-1} and p(Y1=y1)p(Y_1=y_1) is the distribution of the first point.

This is the Exact likelihood for an AR(1) process. The conditional likelihood multiplied by the marginal likelihood of the first point.

Density of p(Y1=y1)p(Y_1=y_1)

p(Y1=y1)p(Y_1=y_1) is normally distributed with mean = X1β_X_1\underline{\beta} which equates to β0+β1\beta_0 + \beta_1 with a variance = σ21ϕ2\frac{\sigma^2}{1-\phi^2} and has density:

p(Y1=y1)=12πσ2/(1ϕ2)exp(12(y1β0β1σ/(1ϕ2))2)p(Y_1=y_1) = \frac{1}{\sqrt{2\pi}\sqrt{\sigma^2/(1-\phi^2)}}exp \left( -\frac{1}{2}\left( \frac{ y_1 - \beta_0-\beta_1 }{\sigma/\sqrt(1-\phi^2)}\right)^2\right)

Density of p(Yt=ytYt1=yt1)p(Y_t = y_t │ Y_{t-1}=y_{t-1})

The conditional distribution of p(Yt=ytYt1=yt1)p(Y_t = y_t │ Y_{t-1}=y_{t-1}) is also normally distributed but we get to it in a round about way:

Recall yt=β0+β1t+uty_t = \beta_0 + \beta_1 t + u_t then ϕyt1=ϕβ0+ϕβ1(t1)+ϕut1\phi y_{t-1} = \phi \beta_0 + \phi \beta_1 (t-1) + \phi u_{t-1}

So

ytϕyt1=β0+β1t+utϕβ0ϕβ1(t1)ϕut1y_t - \phi y_{t-1} = \beta_0 + \beta_1 t + u_t - \phi \beta_0 - \phi\beta_1(t-1) -\phi u_{t-1}

yt=ϕyt1+β0(1ϕ)+β1(tϕt+ϕ)+ety_t = \phi y_{t-1} + \beta_0(1-\phi) + \beta_1 (t - \phi t+\phi) + e_t

Rearrange to obtain

ytϕyt1β0(1ϕ)β1(tϕt+ϕ)=ety_t - \phi y_{t-1} - \beta_0(1-\phi) - \beta_1 (t - \phi t+\phi) = e_t which has a normal distribution with mean = 0 and variance = σ2\sigma^2

This results in

p(Yt=ytYt1=yt1)=t=2n1σ2πexp(12(ytϕyt1β0(1ϕ)β1(tϕt+ϕ)σ)2)p(Y_t = y_t │ Y_{t-1}=y_{t-1}) = \prod_{t=2}^n \frac{1}{\sigma \sqrt{2\pi}}exp \left( -\frac{1}{2}\left( \frac{ y_t - \phi y_{t-1} - \beta_0(1-\phi) - \beta_1 (t - \phi t+\phi) }{\sigma}\right)^2\right)

Exact likelihood for AR1

The likelihood is therefore:

L(θ_;y_)=t=2n1σ2πexp(12(ytϕyt1β0(1ϕ)β1(tϕt+ϕ)σ)2)×12πσ2/(1ϕ2)exp(12(y1β0β1σ/(1ϕ2))2)L(\underline{\theta};\underline{y}) = \prod_{t=2}^n \frac{1}{\sigma \sqrt{2\pi}}exp \left( -\frac{1}{2}\left( \frac{ y_t - \phi y_{t-1} - \beta_0(1-\phi) - \beta_1 (t - \phi t+\phi) }{\sigma}\right)^2\right) \times \frac{1}{\sqrt{2\pi}\sqrt{\sigma^2/(1-\phi^2)}}exp \left( -\frac{1}{2}\left( \frac{ y_1 - \beta_0 -\beta_1 }{\sigma/\sqrt(1-\phi^2)}\right)^2\right)

Taking the logs and simplifying results in:

logL(θ_;y_)=nlogσn2log(2π)12log(1ϕ2)12σ2((y1β0β1)2(1ϕ2)+Σt=2n(ytϕyt1β0(1ϕ)β1(tϕt+ϕ))2) logL(\underline{\theta};\underline{y}) = -nlog\sigma -\frac{n}{2}log(2\pi)-\frac{1}{2}log(1-\phi^2) -\frac{1}{2\sigma^2} \left( (y_1-\beta_0 -\beta_1)^2(1-\phi^2) + \Sigma_{t=2}^n (y_t-\phi y_{t-1}-\beta_0(1-\phi)-\beta_1(t-\phi t + \phi))^2 \right) Note that

β0(1ϕ)+β1(tϕt+ϕ)=β0β0ϕ+β1tβ1ϕt+β1ϕ=β0+β1tϕ(β0+β1(t1))=Xtβ_ϕXt1β_\begin{align} &\beta_0(1-\phi) + \beta_1 (t - \phi t+\phi) \\ &= \beta_0 -\beta_0\phi + \beta_1t - \beta_1\phi t + \beta_1\phi \\ &= \beta_0 + \beta_1t - \phi (\beta_0 + \beta_1(t -1)) \\ &= X_t\underline{\beta} - \phi X_{t-1}\underline{\beta} \end{align}

Using the notation of Beach and MacKinnon (1978a) we can simplify the log likelihood, logL(θ_;y_)=nlogσn2log(2π)12log(1ϕ2)12σ2((y1X1β_)2(1ϕ2)+Σt=2n(ytXtβ_ϕ(yt1Xt1β_)2)\begin{align} logL(\underline{\theta};\underline{y}) &= -nlog\sigma -\frac{n}{2}log(2\pi)-\frac{1}{2}log(1-\phi^2) \\ &-\frac{1}{2\sigma^2} \left( (y_1-X_1\underline{\beta})^2(1-\phi^2) + \Sigma_{t=2}^n (y_t- X_t\underline{\beta} - \phi( y_{t-1}- X_{t-1}\underline{\beta})^2 \right)\\ \end{align}

and substituting σ̂2\hat\sigma^2 we get

logL(β_,ϕ;y_)=const.+12log(1ϕ2)n2log((y1X1β_)2(1ϕ2)+t=2n(ytXtβ_ϕ(yt1Xt1β_))2)\begin{align} logL\left( \underline{\beta}, \phi; \underline{y} \right ) &= const. +\frac{1}{2}log(1-\phi^2) \\ &-\frac{n}{2}log\left( (y_1-X_1\underline{\beta})^2(1-\phi^2) + \sum^n_{t=2} (y_t - X_t\underline{\beta} - \phi (y_{t-1} -X_{t-1}\underline{\beta}))^2 \right) \\ \end{align}

AR(2)

L(θ_;y_)=t=2np(Yt=ytYt1=yt1,Yt2=yt2)×p(Y2=y2|Y1=y1)×p(Y1=y1)L(\underline{\theta};\underline{y})=\prod_{t=2}^n p(Y_t=y_t│Y_{t-1}=y_{t-1},Y_{t-2}=y_{t-2}) \times p(Y_2=y_2 | Y_1=y_1) \times p(Y_1=y_1)

Following the same reasoning we can obtain the densities of each of the three components of the likelihood. The exact likelihood is the product of the conditional likelihood, the conditional distribution of y2|y1y_2 | y_1 and the marginal distribution of y1y_1.

Density of p(Y1=y1)p(Y_1=y_1)

p(Y1=y1)p(Y_1=y_1) is normally distributed with mean = X1β_X_1\underline{\beta} which equates to β0+β1\beta_0 + \beta_1 with a variance = σ21ϕ12ϕ22\frac{\sigma^2}{1-\phi_1^2 - \phi_2^2}

Density of p(Y2=y2Y1=y1)p(Y_2=y_2│Y_1=y_1)

p(Y2=y2Y1=y1)p(Y_2=y_2│Y_1=y_1) is more complicated. Need to work this out

Density of p(Yt=ytYt1=yt1,Yt2=yt2)p(Y_t = y_t │ Y_{t-1}=y_{t-1},Y_{t-2}=y_{t-2})

Recall yt=β0+β1t+uty_t = \beta_0 + \beta_1 t + u_t then ϕ1yt1=ϕ1β0+ϕ1β1(t1)+ϕ1ut1\phi_1 y_{t-1} = \phi_1 \beta_0 + \phi_1 \beta_1 (t-1) + \phi_1 u_{t-1} and ϕ2yt2=ϕ2β0+ϕ2β1(t2)+ϕ2ut2\phi_2 y_{t-2} = \phi_2 \beta_0 + \phi_2 \beta_1 (t-2) + \phi_2 u_{t-2}

So

ytϕ1yt1ϕ2yt2=β0+β1t+utϕ1β0ϕ1β1(t1)ϕ1ut1ϕ2β0ϕ2β1(t2)ϕ2ut2y_t - \phi_1 y_{t-1} - \phi_2 y_{t-2} = \beta_0 + \beta_1 t + u_t - \phi_1 \beta_0 - \phi_1\beta_1(t-1) -\phi_1 u_{t-1} - \phi_2 \beta_0 - \phi_2\beta_1(t-2) -\phi_2 u_{t-2} which simplifies to

ytϕ1yt1ϕ2yt2β0(1ϕ1ϕ2)β1(tϕ1(t1)ϕ2(t2))=ety_t - \phi_1 y_{t-1} - \phi_2 y_{t-2} - \beta_0 (1-\phi_1-\phi_2) - \beta_1 (t - \phi_1(t-1)- \phi_2(t-2)) = e_t which results in

p(Yt=ytYt1=yt1,Yt2=yt2)=t=3n1σ2πexp(12(ytϕ1yt1ϕ2yt2β0(1ϕ1ϕ2)β1(tϕ1(t1)ϕ2(t2))σ)2)p(Y_t = y_t │ Y_{t-1}=y_{t-1},Y_{t-2}=y_{t-2}) = \prod_{t=3}^n \frac{1}{\sigma \sqrt{2\pi}}exp \left( -\frac{1}{2}\left( \frac{ y_t - \phi_1 y_{t-1} - \phi_2 y_{t-2} - \beta_0 (1-\phi_1-\phi_2) - \beta_1 (t - \phi_1(t-1)- \phi_2(t-2)) }{\sigma}\right)^2\right)

Exact likelihood for AR2

The likelihood is therefore:

References

Beach, Charles M., and James G. MacKinnon. 1978a. “A Maximum Likelihood Procedure for Regression with Autocorrelated Errors.” Econometrica 46 (1): 51. https://doi.org/10.2307/1913644.
———. 1978b. “Full Maximum Likelihood Estimation of Second- Order Autoregressive Error Models.” Journal of Econometrics 7 (2): 187–98. https://doi.org/10.1016/0304-4076(78)90068-4.
Hamilton, James D. 1994. Time Series Analysis. Princeton, N.J: Princeton University Press.