15  Spatial auto-regressive models

15.1 Introduction

We now turn attention to the problem of building joint probability distributions over a lattice of sites. In particular, we would like to extend the concept of an auto-regressive model from time series to spatial data. Unfortunately, due to the lack of natural ordering on spatial locations, this is not completely straightforward. In particular, there is more than one way to do it, and they are not equivalent. Regardless, we would somehow like to make some kind of specification regarding the distribution at individual sites, and then use these to build a full joint distribution. Specifying conditional distributions is one natural way to do this, so we begin by thinking about how and when conditional distributions specify a joint distribution.

15.1.1 Compatible distributions

Given a joint distribution for a pair of random quantities \(X\) and \(Y\), we know that we can factor it into a product of a marginal and a conditional two different ways, \[ f_{X,Y}(x, y) = f_X(x)f_{Y|X}(y|x) = f_Y(y)f_{X|Y}(x|y), \] and that both of these factorisations can be useful. But now suppose that we are only given the two conditional distributions, \(f_{X|Y}(x|y)\) and \(f_{Y|X}(y|x)\). Do these two conditional distributions uniquely determine the joint distribution, and if so, can we deduce it? Furthermore, given two arbitrary conditional distributions, can we tell if they consistent with any joint distribution at all? In general, not all possible pairs of conditional distributions do determine a well-defined joint distribution. When they do, we say that the conditionals are compatible.

Proposition (Compatible distributions)

Conditional densities \(f_{X|Y}(x|y)\) and \(f_{Y|X}(y|x)\) are compatible if they have the same support and there exist functions \(u(x)\) and \(v(y)\) such that \[ \frac{f_{X|Y}(x|y)}{f_{Y|X}(y|x)} = u(x)v(y), \quad \text{where} \quad \int_\mathbb{R} u(x)\,\mathrm{d}x < \infty. \] That is, the ratio of densities factorises into separate functions of \(x\) and \(y\), and the function of \(x\) is integrable.

Proof. (sketch) If \(f_{X,Y}(x, y)\) exists with marginals \(f_X(x)\) and \(f_Y(y)\), then \(f_X(x)f_{Y|X}(y|x) = f_Y(y)f_{X|Y}(x|y)\) implies \[ \frac{f_{X|Y}(x|y)}{f_{Y|X}(y|x)} = \frac{f_X(x)}{f_Y(y)} = u(x)v(y), \] where \(u(x)=f_X(x)\) and \(u(y) = 1/f(y)\). Clearly \(u\) and \(v\) are determined only up to a multiplicative constant, but since \(f_X(x)\) integrates to one, \(u(x)\) must be integrable.

Note that normalising \(u(x)\) will give \(f_X(x)\), and together with \(f_{Y|X}(y|x)\) this gives the joint distribution.

So this proposition gives us a way of checking compatibility, and in the case of compatibility, gives us a way to compute the associated joint distribution. We will later see an easier way to construct the joint in situations where we know the conditionals are compatible.

15.1.1.1 Example: AR(1)

Let \(X\) and \(Y\) be two consecutive values of an AR(1) model (eg. \(X_t\) and \(X_{t+1}\)). We know that the conditional density for \(Y|X\) is \[ f_{Y|X}(y|x) = \mathcal{N}(y;\alpha x, \sigma^2), \] but in the stationary case (\(|\alpha|<1\)) we know that the process is time-reversible, so we must also have \[ f_{X|Y}(x|y) = \mathcal{N}(x;\alpha y, \sigma^2). \] In this case we know that the conditionals are compatible, and determine the joint distribution of a pair of values from a stationary AR(1) model, but let’s check, anyway. We form the ratio of conditional densities and simplify to find \[ \frac{f_{X|Y}(x|y)}{f_{Y|X}(y|x)} = u(x)v(y), \] where \[ u(x) = \exp\left\{-\frac{1-\alpha^2}{2\sigma^2}x^2\right\}, \quad \text{and} \quad v(y) = \exp\left\{\frac{1-\alpha^2}{2\sigma^2}y^2\right\}. \] So the ratio does indeed factorise into separate functions of \(x\) and \(y\), and further, \(u(x)\) is integrable, since it normalises to \[ f_X(x) = \mathcal{N}\left(x;0,\frac{\sigma^2}{1-\alpha^2}\right), \] the stationary distribution of the AR(1), as we would expect.

Before moving on it is worth briefly considering the case \(\alpha=1\). This corresponds to the (non-stationary) random walk model. Here the ratio of conditionals simplifies to 1. We can write this in the form \(u(x)=1\), \(v(y)=1\), but \(u(x)\) is not integrable in this case. So the conditionals for the random walk model are not compatible, and do not correspond to a (proper) joint distribution. However, it turns out that they do in fact correspond to an improper joint distribution, analogous to the intrinsically stationary distributions we considered in continuous space. Such improper random walk priors are actually used quite often in spatial statistics, but the technical details will not concern us.

15.1.2 Conditional versus simultaneous specification

We have just seen in the above example that the conditional specifications \[\begin{align*} f_{X|Y}(x|y) &= \mathcal{N}(x;\alpha y, \sigma^2),\\ f_{Y|X}(y|x) &= \mathcal{N}(y;\alpha x, \sigma^2), \end{align*}\] correspond to a pair of consecutive values from a stationary AR(1) model with stationary variance \(\sigma^2/(1-\alpha^2)\). In the time series part of the course we regularly moved back-and-forth between conditional specifications like this and equational representations. We might therefore be tempted to write down an equational form of this conditionally specified model as the simultaneous system \[\begin{align*} X &= \alpha Y + \varepsilon_1 \\ Y &= \alpha X + \varepsilon_2, \end{align*}\] where \(\varepsilon_1\) and \(\varepsilon_2\) are independent mean zero Gaussians with variance \(\sigma^2\). We will see shortly that this is a perfectly reasonable model, and does correspond to a perfectly reasonable joint distribution over \(X\) and \(Y\). However, it is very important to understand that this is not the same as the conditional model above, and in particular, does not correspond to a pair of values from an AR(1) process. Great care should therefore be used in the interpretation of simultaneously specified models.

We can solve for the joint distribution by writing the model in matrix form as \[ \begin{pmatrix}1&-\alpha\\-\alpha&1\end{pmatrix}\binom{X}{Y} = \binom{\varepsilon_1}{\varepsilon_2}, \] from which we deduce \[ \binom{X}{Y} = \begin{pmatrix}1&-\alpha\\-\alpha&1\end{pmatrix}^{-1}\binom{\varepsilon_1}{\varepsilon_2} = \frac{1}{1-\alpha^2}\begin{pmatrix}1&\alpha\\\alpha&1\end{pmatrix}\binom{\varepsilon_1}{\varepsilon_2}. \] It is now clear that \(X\) and \(Y\) are jointly Gaussian with mean zero and variance matrix \[ \frac{\sigma^2}{(1-\alpha^2)^2}\begin{pmatrix}1&\alpha\\\alpha&1\end{pmatrix}^2 = \frac{\sigma^2}{(1-\alpha^2)^2}\begin{pmatrix}1+\alpha^2&2\alpha\\2\alpha&1+\alpha^2\end{pmatrix}. \] In particular, note that the marginal variance of \(X\) (and \(Y\)) is \[ \frac{1+\alpha^2}{(1-\alpha^2)^2}\sigma^2, \] which is not the stationary variance of an AR(1) model. There are many ways to understand why the two different approaches to specification are different (they just are!), but it may be helpful to note that in the simultaneously specified model, both errors are correlated with both \(X\) and \(Y\), which is not the case for the conditional specification.

15.2 Simultaneous auto-regressive (SAR) models

Consider the development of a joint model for \(n\) spatial locations. Inspired by the equational form of an auto-regressive time series model, we could model each location as a linear combination of the others plus some residual error. \[ Z_i = \sum_{j\not= i} b_{ij} Z_j + \varepsilon_i, \quad \varepsilon_i\sim\mathcal{N}(0,\sigma_i^2),\quad i=1,2,\ldots,n. \] This kind of simultaneous specification is very closely related to the simultaneous equations models that are widely used in econometrics. Just as an auto-regressive time series model does not depend on all previous observations, we would expect that most \(b_{ij}\) will be 0. The coefficient \(b_{ij}\) will be non-zero only when \(Z_j\) has a “direct” influence on \(Z_i\). We can write the model in matrix form as \[ \mathbf{Z} = \mathsf{B}\mathbf{Z} + \pmb{\varepsilon},\quad \pmb{\varepsilon}\sim\mathcal{N}(\mathbf{0},\mathsf{\Lambda}),\quad \mathsf{\Lambda}=\operatorname{diag}\{\sigma_i\}, \] and \(b_{ii}=0\ \forall i\). The non-zero elements of \(\mathsf{B}\) represent some kind of “neighbourhood” structure for the spatial locations, but given our previous discussion about the difference between conditional and simultaneous specification, we should be cautious about over-interpreting what this neighbourhood structure represents. Re-writing the matrix form as \[ (\mathbb{I}-\mathsf{B})\mathbf{Z} = \pmb{\varepsilon}, \] we see that \[ \mathbf{Z} \sim \mathcal{N}\left(\mathbf{0}, (\mathbb{I}-\mathsf{B})^{-1}\mathsf{\Lambda} (\mathbb{I}-\mathsf{B}^\top)^{-1}\right). \] We know that the conditional independence structure of a GGM is determined by the zero-structure of the precision matrix. Here, the precision matrix is \((\mathbb{I}-\mathsf{B}^\top)\mathsf{\Lambda}^{-1}(\mathbb{I}-\mathsf{B})\), but since \(\mathsf{\Lambda}\) is just a diagonal rescaling, the zero structure of the precision matrix will be the same as the zero structure of \((\mathbb{I}-\mathsf{B})^\top(\mathbb{I}-\mathsf{B})\). Now, typically, this will have fewer zeros than \(\mathbb{I}-\mathsf{B}\), and so the model will have a less sparse conditional independence structure than the zero structure of \(\mathsf{B}\) might suggest. But regardless, the zero structure of \((\mathbb{I}-\mathsf{B})^\top(\mathbb{I}-\mathsf{B})\) determines the conditional independence structure of the implied GGM.

15.2.1 Example: 1d local neighbourhood

Suppose that we are modelling on a regular 1d lattice, and propose the local neighbourhood model \[ Z_t = \beta(Z_{t-1}+Z_{t+1}) + \varepsilon_t,\quad \varepsilon_t \sim \mathcal{N}(0,\sigma^2). \] For this model, assuming \(n\) values, and making some reasonable (unimportant) choices for the end points, we have \[ \mathsf{B} = \begin{pmatrix} 0 & \beta & 0 & \cdots & 0 &0 \\ \beta & 0 & \beta & \ddots &0 & 0 \\ 0 & \beta & 0 & \ddots & 0 & 0\\ \vdots & \ddots & \ddots & \ddots &\ddots & \ddots \\ 0 & 0 & 0 & \ddots & 0 &\beta \\ 0 & 0 & 0 & \ddots & \beta & 0 \end{pmatrix} \] which is tridiagonal, as is \(\mathbb{I}-\mathsf{B}\). That is, they have a bandwidth of one. However, \[ (\mathbb{I}-\mathsf{B})^\top(\mathbb{I}-\mathsf{B}) = \begin{pmatrix} 1+\beta^2 & -2\beta & \beta^2 & \cdots & 0 &0 \\ -2\beta & 1+2\beta^2 & -2\beta & \ddots &0 & 0 \\ \beta^2 & -2\beta & 1+2\beta^2 & \ddots & \ddots & 0\\ \vdots & \ddots & \ddots & \ddots &\ddots & \ddots \\ 0 & 0 & \ddots & \ddots & 1+2\beta^2 &-2\beta \\ 0 & 0 & 0 & \ddots & -2\beta & 1+\beta^2 \end{pmatrix} \] has a bandwidth of two. It therefore cannot have the conditional independence structure of an AR(1) model, despite the superficial similarity between the model specification and the local characteristics of the AR(1). However, since the precision matrix has a bandwidth of 2 and we know that the AR(2) has a PACF which truncates after lag 2, it does have the same conditional independence structure as the AR(2). In fact, re-writing the model specification as \[ Z_{t+1} = \frac{1}{\beta}Z_t - Z_{t-1} - \frac{1}{\beta}\varepsilon_t \] is strongly suggestive of an AR(2) model, but there are still questions about what the noise process is independent of (if anything).

We can better understand this process by directly computing its spectrum, using the DTFT, as we did in Chapter 5. Applying the DTFT to the model definition we get \[ (1-\beta[e^{-2\pi\mathrm{i}\nu}+e^{2\pi\mathrm{i}\nu}])\hat{Z}(\nu) = \sigma\eta(\nu) \] From which we deduce the spectrum \[ S(\nu) = \frac{\sigma^2}{(1-2\beta\cos 2\pi\nu)^2}. \] This spectrum is consistent with that of an AR(2) process, and so there is an AR(2) model entirely equivalent to this model. Identifying the precise correspondence is left as an exercise. Regardless, we see that the spectrum will remain finite provided that \(|\beta|< 1/2\), and so this is the condition required for stationarity of the process.

15.2.2 Spectrum of a 2d stationary SAR model

Suppose that we are interested in defining a SAR model on an infinite 2d regular square lattice that is stationary. We can start with the obvious 2d generalisation of the 1d model just considered: \[ Z_{j,k} = \alpha(Z_{j+1,k}+Z_{j-1,k}) + \beta(Z_{j,k+1}+Z_{j,k-1}) + \varepsilon_{j,k},\quad \varepsilon_{j,k}\sim\mathcal{N}(0,\sigma^2). \] So, the neighbourhood of each location consists of the 4 nearest (first-order) neighbours. This specification is translation-invariant (\(\alpha\) and \(\beta\) do not depend on \(j\) or \(k\)), so there is some hope that this could potentially determine a stationary field. We will often choose \(\alpha=\beta\) but stationarity doesn’t require this, and it will sometimes be useful to allow the horizontal and vertical variation to differ. In spatial applications we will invariably be interested in the case \(\alpha,\beta>0\), so we restrict attention to this case. It will be very helpful to know the spectrum associated with this model, and we can deduce this by applying the 2d lattice FT to the model.

In order to apply the FT to the model, we need to know how the transform changes under a spatial shift. If we know that the field \(f(j,k)\) has FT \(\hat{f}(\nu_x, \nu_y)\), and we define \(g(j,k) = f(j+1, k)\) (recalling our previously discussed convention, this means shifting \(f\) to the left), then a simple computation gives \[ \hat{g}(\nu_x, \nu_y) = e^{2\pi\mathrm{i}\nu_x}\hat{f}(\nu_x,\nu_y), \] and the obvious corresponding results hold for the other three shifts.

We can now apply the FT to our model definition to get \[\begin{align*} [1-\alpha(e^{2\pi\mathrm{i}\nu_x}+e^{-2\pi\mathrm{i}\nu_x}) + \beta(e^{2\pi\mathrm{i}\nu_y}+e^{-2\pi\mathrm{i}\nu_y})]\hat{Z}(\nu_x,\nu_y) &= \hat{\varepsilon}(\nu_x,\nu_y)\\ \Rightarrow (1-2\alpha\cos 2\pi\nu_x - 2\beta\cos 2\pi\nu_y)\hat{Z}(\nu_x,\nu_y) &= \sigma\eta(\nu_x,\nu_y), \end{align*}\] where \(\hat{Z}(\cdot,\cdot)\) is the FT of \(Z_{j,k}\) and \(\eta(\cdot,\cdot)\) is a white noise process. From this we conclude that the spectrum of the process is \[ S(\nu_x,\nu_y) = \frac{\sigma^2}{(1-2\alpha\cos 2\pi\nu_x - 2\beta\cos 2\pi\nu_y)^2}. \] From this we can see that the denominator will remain positive provided that \(\alpha+\beta < 1/2\) (ie. the total weight assigned to the four neighbours is less than one), and so that is our criterion for stationarity.

We can plot this spectrum in the case \(\alpha=\beta=1/5\), which clearly satisfies our stationarity condition.

n = 201
x = seq(-0.5, 0.5, length.out=n)
coords = expand.grid(x, x)

S = function(xy)
  1/(1 - 2*cos(2*pi*xy[,1])/5 - 2*cos(2*pi*xy[,2])/5)^2
  
z_vec = S(coords)
z_mat = matrix(z_vec, ncol=n)
image(x, x, z_mat, col=topo.colors(100))
contour(x, x, z_mat, add=TRUE)

We can see that for low frequencies (close to the centre of the plot), the contours are very circular, representing isotropic behaviour. However, as we move to higher frequencies the contours become distorted as the effects of the discrete lattice become more apparent.

15.2.3 Simulation of stationary SAR models on a toroidal lattice

In practice we cannot work with infinite lattices. The closest we can do is work on a finite square lattice with periodic boundary conditions. This is equivalent to a toroidal lattice. In this case, the DFT of the model has a simple form allowing exact simulation, even for large lattices. We have the same model as just considered, \[ Z_{j,k} = \alpha(Z_{j+1,k}+Z_{j-1,k}) + \beta(Z_{j,k+1}+Z_{j,k-1}) + \varepsilon_{j,k},\quad \varepsilon_{j,k}\sim\mathcal{N}(0,\sigma^2). \] But now \(j,k=0,1,\ldots,n-1\) and \(Z_{-j, k}=Z_{n-j, k}\), etc. In the case of periodic boundary conditions, the DFT of a shifted field again is simply related to the DFT of the unshifted field, so applying the 2d DFT to this model gives \[ [1-2\alpha\cos(2\pi l/n) - 2\beta\cos(2\pi m/n)]\hat{Z}_{l,m} = \hat{\varepsilon}_{l,m},\quad\text{where}\quad \hat{\varepsilon}_{l,m}\sim\mathcal{CN}(0,n^2\sigma^2). \] In other words, \[ \hat{Z}_{l,m} = \frac{n\sigma}{1-2\alpha\cos(2\pi l/n) - 2\beta\cos(2\pi m/n)}\varepsilon'_{l,m},\quad \text{where}\quad \varepsilon'_{l,m}\sim\mathcal{CN}(0,1). \] So we can just simulate this field in frequency space and invert. Again, the tricky bit is to ensure that all required conjugate symmetries are respected in order to get a real field. This is implemented below.

ifft = function(x) fft(x, inverse=TRUE) / length(x)

rcnorm = function(n, mu=0, sd=1)
    sd*complex(n, real=rnorm(n), imag=rnorm(n))/sqrt(2) + mu

rSAR2d = function(n, alpha, beta, sigma=1) {
    ss = matrix(0, n, n)
    ss = sigma * n / (
        1 - 2*alpha*cos(2*pi*(col(ss)-1)/n) 
        - 2*beta*cos(2*pi*(row(ss)-1)/n))
    y = ss * matrix(rcnorm(n*n), n, n)
    y[2:(n/2), (n/2 + 2):n] = y[2:(n/2), (n/2):2]
    y[(n/2 + 2):n, 2:(n/2)] = Conj(y[(n/2):2, 2:(n/2)])
    y[1:(n/2), 1:(n/2)] = ss[1:(n/2), 1:(n/2)] *
        matrix(rcnorm(n*n/4), n/2, n/2)
    y[(n/2 + 2):n, (n/2 + 2):n] = Conj(y[(n/2):2, (n/2):2])
    y[1, (n/2+2):n] = Conj(y[1, (n/2):2])
    y[n/2 + 1, (n/2+2):n] = Conj(y[n/2 + 1, (n/2):2])
    y[(n/2+2):n, 1] = Conj(y[(n/2):2, 1])
    y[(n/2+2):n, n/2 + 1] = Conj(y[(n/2):2, n/2 + 1])
    y[1, 1] = Re(y[1, 1])
    y[n/2 + 1, 1] = Re(y[n/2 + 1, 1])
    y[1, n/2 + 1] = Re(y[1, n/2 + 1])
    y[n/2 + 1, n/2 + 1] = Re(y[n/2 + 1, n/2 + 1])
    Re(ifft(y))
}

We can try this out on a \(20\times 20\) lattice with \(\alpha=\beta=1/5\).

set.seed(24)
sar = rSAR2d(20, 0.2, 0.2)
image(sar, col=topo.colors(100))

mean(sar)
[1] 0.02412274
[1] 2.513869

Since the FFT is extremely efficient, there’s no problem generating simulations on large lattices with \(\alpha\) and \(\beta\) close to the critical values.

image(rSAR2d(500, 0.249, 0.249), col=topo.colors(100))

To be clear, there is no problem having (say) \(\beta>1/4\) so long as \(\alpha+\beta<1/2\).

image(rSAR2d(500, 0.209, 0.289))

15.3 Conditional auto-regressive (CAR) models

As an alternative to using a simultaneous specifcation, we may prefer to specify a spatial model condtionally. Specifically, we may want to specify a set of local characteristics (or full conditional distributions) in order to fix a joint distribution over a random field. This is arguably more natural and intuitive than making a simultaneous specification. OTOH, we know that an arbitrary set of local characteristics may not be compatible with any joint distribution, and so care is required to ensure consistency with a proper model. To persue the conditional specification approach, it will be extremely useful to have a simple way of determining the joint distribution associated with a particular set of local characteristics. The solution to this problem is typically referred to as Brook’s lemma.

15.3.1 Brook’s lemma (AKA Besag’s factorisation theorem)

Aside: this result is named after David Brook and/or Julian Besag, both of whom had strong connections to the North East of England. David worked for many years as a lecturer at Newcastle University. Julian was the Professor of Statistics at Durham University immediately before the appointment of Michael Goldstein. Shortly after leaving Durham, Julian was also briefly Professor of Statistics at Newcastle.

If \(\pi(\mathbf{x})\) is a density with \(\pi(\mathbf{x})>0\, \forall \mathbf{x}\in\mathcal{D}\subset\mathbb{R}^n\), then \(\forall \mathbf{x}, \mathbf{x}^\star \in \mathcal{D}\) we have \[\begin{align*} \frac{\pi(\mathbf{x})}{\pi(\mathbf{x}^\star)} &= \prod_{i=1}^n \frac{\pi(x_i|x_1,\ldots,x_{i-1},x_{i+1}^\star,\ldots,x_n^\star)}{\pi(x_i^\star|x_1,\ldots,x_{i-1},x_{i+1}^\star,\ldots,x_n^\star)} \\ &= \prod_{i=1}^n \frac{\pi(x_i|x_1^\star,\ldots,x_{i-1}^\star,x_{i+1},\ldots,x_n)}{\pi(x_i^\star|x_1^\star,\ldots,x_{i-1}^\star,x_{i+1},\ldots,x_n)} . \end{align*}\]

Proof. We consider the case \(n=3\), from which generalisations should be clear. We start with \(\pi(\mathbf{x})\) and expand and simplify each variable in turn. Starting with \(x_1\) gives one equality and starting with \(x_n\) gives the other. We start with \(x_1\). \[\begin{align*} \pi(\mathbf{x}) &= \pi(x_1,x_2,x_3) \\ &= \pi(x_1|x_2,x_3)\pi(x_2,x_3) \\ &= \pi(x_1|x_2,x_3)\frac{\pi(x^\star_1|x_2,x_3)}{\pi(x^\star_1|x_2,x_3)}\pi(x_2,x_3) \\ &= \frac{\pi(x_1|x_2,x_3)}{\pi(x^\star_1|x_2,x_3)}\pi(x^\star_1,x_2,x_3) \\ &= \frac{\pi(x_1|x_2,x_3)}{\pi(x^\star_1|x_2,x_3)}\pi(x_2|x^\star_1,x_3)\pi(x^\star_1,x_3) \\ &= \frac{\pi(x_1|x_2,x_3)}{\pi(x^\star_1|x_2,x_3)}\frac{\pi(x_2|x^\star_1,x_3)}{\pi(x^\star_2|x^\star_1,x_3)}\pi(x^\star_1,x^\star_2,x_3) \\ &= \frac{\pi(x_1|x_2,x_3)}{\pi(x^\star_1|x_2,x_3)}\frac{\pi(x_2|x^\star_1,x_3)}{\pi(x^\star_2|x^\star_1,x_3)}\pi(x_3|x^\star_1,x^\star_2)\pi(x^\star_1,x^\star_2) \\ &= \frac{\pi(x_1|x_2,x_3)}{\pi(x^\star_1|x_2,x_3)}\frac{\pi(x_2|x^\star_1,x_3)}{\pi(x^\star_2|x^\star_1,x_3)}\frac{\pi(x_3|x^\star_1,x^\star_2)}{\pi(x^\star_3|x^\star_1,x^\star_2)}\pi(\mathbf{x}^\star) \end{align*}\]

This result allows us to pick a convenient reference state, \(\mathbf{x}^\star\) (often \(\mathbf{x}^\star=\mathbf{0}\) is used), and then write the joint distribution for \(\mathbf{x}\), up to a normalising constant, as a product of ratios of full-conditionals.

15.3.1.1 Example: AR(1)

Returning to our example of a pair of values from an AR(1), we can use Brook’s lemma to expand the joint distribution in two different ways (using reference state \(\mathbf{x}^\star=\mathbf{0}\), and dropping terms not involving \(x\) or \(y\)), \[ f_{X,Y}(x,y) \propto f_{X|Y}(x|0)\frac{f_{Y|X}(y|x)}{f_{Y|X}(0|x)}, \] and \[ f_{X,Y}(x,y) \propto \frac{f_{X|Y}(x|y)}{f_{X|Y}(0|y)}f_{Y|X}(y|0). \] Substituting in our two conditional distributions and simplifying leads to the same joint distribution (bivariate normal with variances \(1/(1-\alpha^2)\) and correlation \(\alpha\)).

15.3.2 CAR model specification

A (Gaussian) CAR model for the joint distribution over \(n\) spatial locations is specified via a normal local characteristic (full-conditional distribution) at each site. \[ (Z_i|\mathbf{Z}_{-i}=\mathbf{z}_{-i}) \sim \mathcal{N}\left( \sum_{j=1}^n a_{ij}z_j, \kappa_i \right),\quad i=1,2,\ldots,n. \] It is assumed that \(\kappa_i>0\) and \(a_{ii}=0,\ \forall i\). The coefficients, \(a_{ij}\) can be stored in the \(n\times n\) matrix \(\mathsf{A}\) with zero diagonal. The non-zero elements of \(\mathsf{A}\) represent some kind of neighbourhood structure (to be made precise soon). We expect that most \(a_{ij}\) will be 0, so \(\mathsf{A}\) will be a sparse matrix. It will also turn out to be convenient to define \(\mathsf{K}=\operatorname{diag}\{\kappa_i\}\). Some care is needed to ensure that the local characteristics are actually compatible with a proper joint distribution. We want to identify the joint distribution and any necessary constraints on the elements of \(\mathsf{A}\) and \(\mathsf{K}\).

15.3.3 Joint distribution

We could just use Brook’s lemma (with reference state \(\mathbf{x}^\star=\mathbf{0}\)) to directly compute the joint density (up to a normalising constant), recognising the multivariate normal distribution that arises. Doing this is left as an exercise. Instead, we will guess that the joint distribution will be of the form \[ \mathbf{Z} \sim \mathcal{N}(\mathbf{0},\mathsf{Q}^{-1}), \] for some (presumably sparse) precision matrix, \(\mathsf{Q}\). By comparing the full-conditionals of this multivariate normal with our CAR local characteristics, we can identify the relationship between \(\mathsf{A}\), \(\mathsf{K}\) and \(\mathsf{Q}\).

Using our results for multivariate normal conditioning based on a partititioned precision matrix, from Section 14.2.4, we find that the full-conditionals are \[ (Z_i|\mathbf{Z}_{-i}=\mathbf{z}_{-i}) \sim \mathcal{N}\left(-\frac{1}{q_{ii}}\sum_{j\not= i} q_{ij}x_j, \frac{1}{q_{ii}} \right). \] Comparing against our CAR local characteristic specifications, we see that we must have \[ q_{ii} = \frac{1}{\kappa_i},\quad \forall i, \] and \[ -\frac{q_{ij}}{q_{ii}} = a_{ij} \quad \Rightarrow \quad q_{ij} = -\frac{a_{ij}}{\kappa_i},\quad \forall i\not= j. \] This fully specifies \(\mathsf{Q}\). In matrix form, \[ \mathsf{Q} = \mathsf{K}^{-1}(\mathbb{I}-\mathsf{A}), \] since pre-multiplication by a diagonal matrix corresponds to rescaling the rows. This reveals some constraints that our specification must satisfy for a joint distribution to exist. First, since the precision matrix is symmetric, we must have \(q_{ij}=q_{ij}\), and therefore we must have \[ \frac{a_{ij}}{\kappa_i} = \frac{a_{ji}}{\kappa_j}, \quad \forall i,j. \] Further, since the precision matrix is positive definite, \(\mathsf{K}^{-1}(\mathbb{I}-\mathsf{A})\) must be positive definite. So, our constraints can be summarised as: \[ \mathsf{K}^{-1}(\mathbb{I}-\mathsf{A}) \] must be symmetric and positive definite. Then, \[ \mathbf{Z} \sim \mathcal{N}(\mathbf{0},(\mathbb{I}-\mathsf{A})^{-1}\mathsf{K}). \] We will need to ensure that any CAR model specification respects these constraints. Provided that we have a consistent model, its properties are quite intuitive. Since \(\mathsf{K}\) simply rescales the rows, the sparsity structure of \(\mathsf{Q}\) and \(\mathbb{I}-\mathsf{A}\) are the same. So the non-diagonal sparsity of \(\mathsf{Q}\) and \(\mathsf{A}\) are the same. That is, the (off-diagonal) zeros in \(\mathsf{A}\) correspond to zeros in \(\mathsf{Q}\), and therefore pairwise conditional independence properties. In other words, the zero structure of \(\mathsf{A}\) determines the conditional independence structure of the implied GGM.

15.3.4 Spectrum of a CAR model on a regular lattice

If we call the variance matrix \(\mathsf{\Sigma}\), the previous section has demonstrated that \[ (\mathbb{I}-\mathsf{A})\mathsf{\Sigma} = \mathsf{K}, \] which we can regard as a set of \(n^2\) linear equations for the elements of \(\Sigma\). Explicitly, if we let \(c_{ij}\) be the \((i,j)\)th element of \(\mathsf{\Sigma}\), then the \((i,j)\)th equation is \[ c_{ij} - \sum_k a_{ik}c_{jk} = \kappa_i\delta_{ij}. \] These equational relations are more conveient in the case of infinitely many sites.

We can now consider a stationary CAR model specification on an infinite regular square lattice. The local characteristic at position \(\mathbf{s}=(j,k)\) is \[ Z_{j,k}|\mathbf{Z}_{-(j,k)} \sim \mathcal{N}(\alpha[Z_{j-1,k}+Z_{j+1,k}]+\beta[Z_{j,k-1}+Z_{j,k+1}],\kappa). \] Our infinite set of equations for the covariogram \(C(\mathbf{h})=C(j,k)=\mathbb{C}\mathrm{ov}[Z_{l,m},Z_{l+j,m+k}]\) are of the form \[ C(j,k) -\alpha[C(j-1,k)+C(j+1,k)]-\beta[C(j,k-1)+C(j,k+1)]=\kappa\delta_j\delta_k. \] These are the equivalent of the Yule-Walker equations for a CAR model on a square lattice. We can Fourier transform these covariance relations in order to get an equation for the spectrum. \[\begin{align*} S(\pmb{\nu}) - \alpha[e^{-2\pi\mathrm{i}\nu_x}+e^{2\pi\mathrm{i}\nu_x}]S(\pmb{\nu}) - \beta[e^{-2\pi\mathrm{i}\nu_y} + e^{2\pi\mathrm{i}\nu_y}]S(\pmb{\nu}) &= \kappa \\ \Rightarrow [1-2\alpha\cos(2\pi\nu_x)-2\beta\cos(2\pi\nu_y)]S(\pmb{\nu}) &=\kappa, \end{align*}\] and so \[ S(\nu) = \frac{\kappa}{1-2\alpha\cos(2\pi\nu_x)-2\beta\cos(2\pi\nu_y)}. \] This is similar to, but importantly different from, the corresponding spectrum of the nearest neighbour SAR model. Again, we want \(\alpha+\beta < 1/2\) for stationarity (assuming \(\alpha,\beta>0\)). Comparing back with the spectra we computed for AR(\(p\)) models, this CAR spectrum is more “AR(1)-like”, whereas the SAR spectrum is more “AR(2)-like”.

15.3.5 Simulation of CAR models

Since the CAR model specification corresponds to a multivariate normal distribution, in principle we can generate realisations of the CAR process using standard methods for simulating from a multivariate normal. However, if the number of sites is large and/or the lattice is regular, then there may be better approaches.

15.3.5.1 Spectral synthesis on a toroidal lattice

If we are working on a finite square lattice and are happy to assume periodic boundary conditions (ie. assume a toroidal lattice), then we can simulate very efficiently using the FFT. We can apply the DFT to our covariance relations, noting that the DFT of \(C(j,k)\) is actually \(S_{l,m}/n^2\) (annoying factors of \(n\)), where \(S_{l,m}\) is the spectral density at frequencies \(l\) and \(m\), to get \[\begin{align*} \frac{S_{l,m}}{n^2} - \alpha[e^{-2\pi\mathrm{i}l/n} + e^{2\pi\mathrm{i}l/n} ]\frac{S_{l,m}}{n^2} - \beta[e^{-2\pi\mathrm{i}m/n}+e^{2\pi\mathrm{i}m/n}]\frac{S_{l,m}}{n^2} &= \kappa \\ \Rightarrow [1-2\alpha\cos(2\pi l/n)-2\beta\cos(2\pi m/n)]S_{l,m} &= n^2\kappa, \end{align*}\] and so \[ S_{l,m} = \frac{n^2\kappa}{1-2\alpha\cos(2\pi l/n)-2\beta\cos(2\pi m/n)}. \] Then, we can use our spectral synthesis approach to generate realisations using the FFT. As usual, the fiddly bit is respecting the required conjugate symmetries.

rCAR2d = function(n, alpha, beta, sigma=1) {
    ss = matrix(0, n, n)
    ss = sigma*n / sqrt(
        1 - 2*alpha*cos(2*pi*(col(ss)-1)/n) 
        - 2*beta*cos(2*pi*(row(ss)-1)/n))
    y = ss * matrix(rcnorm(n*n), n, n)
    y[2:(n/2), (n/2 + 2):n] = y[2:(n/2), (n/2):2]
    y[(n/2 + 2):n, 2:(n/2)] = Conj(y[(n/2):2, 2:(n/2)])
    y[1:(n/2), 1:(n/2)] = ss[1:(n/2), 1:(n/2)] *
        matrix(rcnorm(n*n/4), n/2, n/2)
    y[(n/2 + 2):n, (n/2 + 2):n] = Conj(y[(n/2):2, (n/2):2])
    y[1, (n/2+2):n] = Conj(y[1, (n/2):2])
    y[n/2 + 1, (n/2+2):n] = Conj(y[n/2 + 1, (n/2):2])
    y[(n/2+2):n, 1] = Conj(y[(n/2):2, 1])
    y[(n/2+2):n, n/2 + 1] = Conj(y[(n/2):2, n/2 + 1])
    y[1, 1] = Re(y[1, 1])
    y[n/2 + 1, 1] = Re(y[n/2 + 1, 1])
    y[1, n/2 + 1] = Re(y[1, n/2 + 1])
    y[n/2 + 1, n/2 + 1] = Re(y[n/2 + 1, n/2 + 1])
    Re(ifft(y))
}

We can use this function to generate realisations in the typical way.

car = rCAR2d(200, 0.249, 0.249)
image(car, col=topo.colors(100))

mean(car)
[1] -0.04621952
[1] 2.366668

15.3.5.2 Exact simulation

Away from the very special case of a stationary field on a toroidal lattice, things are not so convenient. We know that \[ \mathbf{Z} \sim \mathcal{N}(\mathbf{0}, \mathsf{Q}^{-1}), \quad \text{where} \quad \mathsf{Q} = \mathsf{K}^{-1}(\mathbb{I}-\mathsf{A}), \] and \(\mathsf{Q}\) will typically be a very sparse matrix. However, \(\mathsf{\Sigma}=\mathsf{Q}^{-1}\) typically will not be very sparse, so we would like to be able to simulate realisations of the field using \(\mathsf{Q}\) directly and without constructing \(\mathsf{\Sigma}\). It is actually quite straightforward to do this. First, form the Cholesky decomposition of \(\mathsf{Q}\) as \[ \mathsf{Q} = \mathsf{L}\mathsf{L}^\top, \] where \(\mathsf{L}\) is a lower triangular matrix. If \(\mathsf{Q}\) is sparse, then \(\mathsf{L}\) will typically also be sparse, though typically not as sparse as \(\mathsf{Q}\). The additional non-zero entries in the lower triangle of \(\mathsf{L}\) are known as fill-in. The level of fill-in depends heavily on the ordering of the rows and columns of \(\mathsf{Q}\), and so fill-reducing permutations can be applied to \(\mathsf{Q}\) prior to computing the Cholesky decomposition in order to maintain as much sparsity as possible. Once we have \(\mathsf{L}\), we can generate \(\varepsilon_i\sim\mathcal{N}(0,1)\) and solve the upper triangular linear system \[ \mathsf{L}^\top\mathbf{Z} = \pmb{\varepsilon} \] for \(\mathbf{Z}\) using backward substitution. Backward substitution can be implemented very efficiently for sparse systems. The resulting \(\mathbf{Z}\) vector has exactly the distribution we want, since \[ \mathbb{V}\mathrm{ar}[\mathbf{Z}] = \mathbb{V}\mathrm{ar}[(\mathsf{L}^\top)^{-1}\pmb{\varepsilon}] = (\mathsf{L}^\top)^{-1}\mathbb{V}\mathrm{ar}[\pmb{\varepsilon}]\mathsf{L}^{-1}= (\mathsf{L}^\top)^{-1}\mathsf{L}^{-1}= \mathsf{Q}^{-1}=\mathsf{\Sigma}, \] as required.

For large systems, we will want to use specialised sparse matrix algorithms (such as provided by the R Matrix package), but for smaller systems we can follow the above recipe using standard dense algorithms. This will still be more numerically stable than attempting to simulate via inversion of \(\mathsf{Q}\).

TODO: do North Carolina example?

15.3.5.3 Gibbs sampling

Gibbs sampler

Markov chain Monte Carlo (MCMC)

15.4 Comparing SAR and CAR

15.5 Auto-models

15.5.1 Ising model

Ising model

15.5.2 Potts model

Potts model

15.5.3 Quartic scalar field ??!

quartic random field