12 Spectral theory for GPs
12.1 Introduction
In Chapter 5 we saw how we could use Fourier transforms in order to decompose time series models and data into frequency components and how that could give us additional insight into the nature of the process. We would now like to do the same for random fields, and GPs in particular. As usual, we will start with the 1d case and then turn to 2d fields. In Chapter 5 we used three different flavours of Fourier transform: Fourier series, for a continuous function on a finite interval; the discrete time Fourier transform, for a discrete time series model; and the discrete Fourier transform for a finite time series. To represent a continuous random field, we will need another Fourier transform, the continuous Fourier transform, usually referred to simply as the Fourier transform, since it really subsumes all other flavours of Fourier transform as special cases.
For a function \(f:\mathbb{R}\rightarrow\mathbb{C}\), the Fourier transform of \(f\), often denoted \(\hat{f}:\mathbb{R}\rightarrow\mathbb{C}\), is defined by \[ \hat{f}(\nu) = \int_{-\infty}^\infty f(x)e^{-2\pi\mathrm{i}\nu x}\,\mathrm{d}x. \] It can be inverted with the inverse Fourier transform, \[ f(x) = \int_{-\infty}^\infty \hat{f}(\nu)e^{2\pi\mathrm{i}\nu x}\,\mathrm{d}\nu. \] The inverse transform represents the function \(f\) as a continuous mixture of oscillations, with the weight and phase of the oscillation at frequency \(\nu\) being determined by the Fourier transform at that frequency, \(\hat{f}(\nu)\). The inverse transform is almost identical to the inverse transform of the DTFT that we met in Section 5.1.2. However, there we didn’t need any oscillations with frequencies higher than \(1/2\), due to only needing to know the function on a grid with a grid spacing of 1. Here, we want to know \(f\) everywhere, so we need all frequencies, and hence we must integrate over the full real line.
That the forward and inverse transforms correctly invert one another follows directly from the key orthogonality property \[ \int_{-\infty}^\infty e^{2\pi\mathrm{i}\nu x}\,\mathrm{d}\nu = \delta(x), \] where, as always, \(\delta(x)\) is Dirac’s delta function. This property can be interpreted as saying that the (inverse) Fourier transform of the constant function, 1, is the delta function. Note that it is clear that both the forward and inverse Fourier transform of the delta function is the constant function, 1.
Using this orthogonality property we can check inversion by considering the inverse transform of the forward transform as \[\begin{align*} \int_\mathbb{R}\mathrm{d}\nu\, \hat{f}(\nu)e^{2\pi\mathrm{i}\nu x} &= \int_\mathbb{R}\mathrm{d}\nu\, \int_\mathbb{R}\mathrm{d}y\,f(y)e^{-2\pi\mathrm{i}\nu y} e^{2\pi\mathrm{i}\nu x} \\ &= \int_\mathbb{R}\mathrm{d}y\, f(y) \int_\mathbb{R}\mathrm{d}\nu\,e^{2\pi\mathrm{i}\nu (x-y)} \\ &= \int_\mathbb{R}\mathrm{d}y\, f(y) \delta(x-y) = f(x). \end{align*}\]
Since these Fourier transforms are now integrals over the whole real line, there are potentially issues of integrability and convergence. The transforms will clearly exist for integrable functions, but the class of functions for which transforms exist can be generalised beyond this (for example, to a constant function, as above). We will largely ignore such technicalities.
It is possible to analytically compute the Fourier transform of many functions, but this often requires some knowledge of complex analysis. Fortunately, the transforms for many functions can be found in Fourier transform tables. Using these, together with the fact that the Fourier transform is linear, allows the computation of Fourier transforms for many functions of interest.
12.2 Spectral representation
We would like to use Fourier transforms to understand random fields. If a random field, \(Z(s)\), has a Fourier transform, \(\hat{Z}(\nu)\), then this is also a random field, and vice versa. So, as in Chapter 5, we will consider the random field that is induced by a Fourier transform that is a weighted white noise process, \(\hat{Z}(\nu)=\sigma(\nu)\eta(\nu)\), where \(\sigma(\cdot)\) is a deterministic weighting function that we are free to specify, and \(\eta(\cdot)\) is a white noise process with covariogram equal to Dirac’s delta. We therefore represent \(Z(s)\) as \[ Z(s) = \int_\mathbb{R} \sigma(\nu)e^{2\pi\mathrm{i}\nu s}\eta(\nu)\,\mathrm{d}\nu, \] so that \(Z(s)\) is a random continuous mixture of frequency components. Again, it may be preferable to write this as an integral wrt a Wiener process as \[ Z(s) = \int_\mathbb{R} \sigma(\nu)e^{2\pi\mathrm{i}\nu s}\mathrm{d}W(\nu), \] which arguably makes it easier to see that \(Z(s)\) will be a GP. Taking expectations gives \(\mathbb{E}[Z(s)]=0,\forall s\in\mathbb{R}\), so \(Z(s)\) is a mean zero GP. We can now compute its covariance function \[\begin{align*} C(s+h, s) &= \mathbb{C}\mathrm{ov}[Z(s+h),Z(s)] \\ &= \mathbb{E}\left[Z(s+h)\overline{Z(s)}\right] \\ &= \int_\mathbb{R} |\sigma(\nu)|^2e^{2\pi\mathrm{i}h\nu}\,\mathrm{d}\nu, \end{align*}\] following exactly the same derivation as in Section 5.2. Since this is independent of \(s\), we conclude that the process is stationary, and defining the spectral density, \(S(\nu) = |\sigma(\nu)|^2\) (so clearly real and non-negative), we can write the covariogram as \[ C(h) = \int_\mathbb{R} S(\nu)e^{2\pi\mathrm{i}h\nu}\,\mathrm{d}\nu. \] Consequently, \(C(\cdot)\) is the inverse Fourier transform of \(S(\cdot)\), and so the spectral density is the Fourier transform of the covariogram, \[ S(\nu) = \int_\mathbb{R}C(h)e^{-2\pi\mathrm{i}h\nu}\,\mathrm{d}h. \] The covariogram and spectral density form a Fourier transform pair.
We know that for a (real) stationary GP, the covariagram is real and even, so (if we prefer) we can write the spectral density as \[ S(\nu) = 2\int_0^\infty C(h)\cos(2\pi h\nu)\,\mathrm{d}h, \] and so \(S(\cdot)\) is also even, in which case we can also write \[ C(h) = 2\int_0^\infty S(\nu)\cos(2\pi h\nu)\,\mathrm{d}\nu. \] The stationary variance of the induced process is \[ C(0) = \int_\mathbb{R} S(\nu)\,\mathrm{d}\nu, \] and so to obtain a weakly stationary GP (with finite variance), \(S(\cdot)\) must be integrable. Since any non-negative even integrable function \(S(\cdot)\) induces a weakly stationary GP, the induced covariogram must be valid (positive definite). This gives us a recipe for constructing valid covariograms: pick any non-negative even integrable function and compute its inverse Fourier transform to get a valid covariogram. As statisticians, we know lots of non-negative integrable functions - probability density functions (PDFs). So, just start with any (possibly scaled) even PDF and compute its inverse Fourier transform. Conversely, we also have a mechanism for checking whether a given covariogram is valid: compute its Fourier transform and check that it is real, non-negative, even, and integrable.
As usual, we have glossed over many technicalities and conditions. Precise results formalising our analysis include the Wiener–Khinchin theorem and Bochner’s theorem. The Karhunen–Loève theorem is also somewhat relevant. That the spectral density and covariogram form a Fourier transform pair is often referred to by statisticians as Bochner’s theorem.
12.3 Examples
12.3.1 Squared-exponential covariogram
Let us start by considering the function \(f(x) = e^{-\alpha x^2}\). It is not too difficult to compute the Fourier transform of this function directly, but we can also look it up in tables to find \[ \hat{f}(\nu) = \sqrt{\frac{\pi}{\alpha}} \exp\{-\pi^2\nu^2/\alpha\}. \] So the transform of a squared-exponential function is squared exponential. Furthermore, using the Gaussian integral, or otherwise, we see that the transform is a correctly normalised Gaussian PDF (that is, it integrates to 1). However, the length scales are reciprocals. As \(\alpha\rightarrow 0\), \(f(x)\) tends to the constant function, 1, whereas \(\hat{f}(\nu)\) tends to the delta function, \(\delta(\nu)\). This is a good way to understand the key orthogonality property we needed to establish invertibility.
If we now think about a squared-exponential covariogram parameterised the usual way \[ C(h) = \sigma^2e^{-(h/\phi)^2}, \] we can Fourier transform it to get \[ S(\nu) = \sigma^2\phi\sqrt{\pi}\,e^{-(\pi\phi\nu)^2}. \] When the length scale, \(\phi\), is large, the covariogram is broad, corresponding to variations over large distances. In this case, the spectrum is narrow and sharply peaked around zero, since the process is dominated by low frequency oscillations. Conversely, when \(\phi\) is small, the covariogram is concentrated near zero, due to correlations persisting over only short length scales. But the spectrum is broad, reflecting the fact that the process is composed of oscillations over a wide range of frequencies.
We have already noted that the delta function transforms to the constant function. Consequently, white noise has a completely flat spectrum, corresponding to the fact that the process is made up from oscillations at all frequencies equally.
12.3.2 Exponential covariogram (OU process)
We start by considering the function \(f(x) = e^{-a|x|}\). From tables, we find that \[ \hat{f}(\nu) = \frac{2a}{a^2+4\pi^2\nu^2}. \] So, the spectrum is in the form of a Cauchy distribution, and careful inspection reveals that this is, again, correctly normalised.
We have seen two different commonly encountered ways of parameterising the exponential covariogram. In the context of spatial statistics, it is most common to write it as \[ C(h) = \sigma^2e^{-|h|/\phi}. \] We can Fourier transform this to obtain \[ S(\nu) = \frac{2\sigma^2\phi}{1+(2\pi\phi\nu)^2}. \] The spectrum has long tails relative to a squared-exponential, reflecting the fact that there is always a mixture across a broad frequency range, irrespective of the length scale. Nevertheless, the reciprocal nature of the length scales is the same as for the squared-exponential covariogram. When the covariogram length scale is large, the spectrum is concentrated close to zero (low frequencies), and vice versa.
12.4 Spectral synthesis and the FFT
TODO
12.4.1 Synthesising intrinsically stationary processes
TODO - include this or not?
12.5 2d Fourier transforms
To understand the spectral characteristics of 2d random fields, we must first understand 2d Fourier transforms. For a function \(f:\mathbb{R}^2\rightarrow\mathbb{C}\), which we might write as \(f(\mathbf{s})\) or \(f(x, y)\), where \(\mathbf{s}=(x,y)^\top\), we compute the Fourier transform by first (1d) Fourier transforming the first (\(x\)) coordinate holding \(y\) fixed to get a function of \(\nu_x\) and \(y\). We then transform the \(y\) coordinate holding \(\nu_x\) fixed to get a function of \(\nu_x\) and \(\nu_y\), \(\hat{f}(\nu_x, \nu_y)\). In fact, as will become clear, the order in which we transform the coordinates doesn’t actually matter. The result is the same.
Starting from \(f(x,y)\), first transform \(x\): \[ \tilde{f}(\nu_x, y) = \int_\mathbb{R} \mathrm{d}x\, f(x, y)e^{-2\pi\mathrm{i}\nu_x x}. \] Now take this function and transform the \(y\) coordinate: \[\begin{align*} \hat{f}(\nu_x,\nu_y) &= \int_\mathbb{R} \mathrm{d}y\, \tilde{f}(\nu_x, y) e^{-2\pi\mathrm{i}\nu_y y} \\ &= \int_\mathbb{R} \mathrm{d}y\, \int_\mathbb{R} \mathrm{d}x\, f(x, y)e^{-2\pi\mathrm{i}\nu_x x} e^{-2\pi\mathrm{i}\nu_y y} \\ &= \int_\mathbb{R} \mathrm{d}y\, \int_\mathbb{R} \mathrm{d}x\, f(x, y) e^{-2\pi\mathrm{i}(\nu_x x + \nu_y y)}. \end{align*}\] It should now be clear that since we can switch the order of integration, the order in which we transform the coordinates doesn’t matter. Writing \(\hat{f}(\pmb{\nu})=\hat{f}(\nu_x,\nu_y)\), we can write the transform more neatly as \[ \hat{f}(\pmb{\nu}) = \int\!\!\!\int_{\mathbb{R}^2} f(\mathbf{s})e^{-2\pi\mathrm{i}\pmb{\nu}^\top\mathbf{s}}\,\mathrm{d}\mathbf{s}, \] which also has obvious higher-dimensional generalisations. It inverts with \[ f(\mathbf{s}) = \int\!\!\!\int_{\mathbb{R}^2} \hat{f}(\pmb{\nu})e^{2\pi\mathrm{i}\pmb{\nu}^\top\mathbf{s}}\,\mathrm{d}\pmb{\nu}. \]
12.5.1 Example
TODO
12.5.2 Spectral representation and Bochner’s theorem
We can now think about spectral representation of 2d random fields. If a 2d random field, \(Z(\mathbf{s})\), has a Fourier transform, \(\hat{Z}(\pmb{\nu})\), then this will also be a random field, and vice versa. So again we consider the random field induced by a weighted (spatial) white noise process. That is, we put \(\hat{Z}(\pmb{\nu}) = \sigma(\pmb{\nu})\eta(\pmb{\nu})\), where \(\sigma(\cdot)\) is a deterministic field that we are free to choose, and \(\eta(\cdot)\) is a spatial white noise process with covariogram \(\delta(\pmb{\nu}) = \delta(\nu_x)\delta(\nu_y)\). Similar computations to the 1d case show that the induced field is mean zero and stationary with covariogram \[ C(\mathbf{h}) = \int\!\!\!\int_{\mathbb{R}^2} S(\pmb{\nu})e^{2\pi\mathrm{i}\pmb{\nu}^\top\mathbf{h}}\,\mathrm{d}\pmb{\nu} = \int\!\!\!\int_{\mathbb{R}^2} S(\pmb{\nu})\cos(2\pi\pmb{\nu}^\top\mathbf{h})\,\mathrm{d}\pmb{\nu}, \] where \(S(\pmb{\nu})=|\sigma(\pmb{\nu})|^2\). This is a 2d inverse Fourier transform and so the corresponding forward transform is \[ S(\pmb{\nu}) = \int\!\!\!\int_{\mathbb{R}^2} C(\mathbf{h})e^{-2\pi\mathrm{i}\pmb{\nu}^\top\mathbf{h}}\,\mathrm{d}\mathbf{h} = \int\!\!\!\int_{\mathbb{R}^2} C(\mathbf{h})\cos(2\pi\pmb{\nu}^\top\mathbf{h})\,\mathrm{d}\mathbf{h}. \]
12.5.3 2d spectral synthesis
TODO
12.6 Spectral analysis of spatial data
TODO
12.6.1 1d data on a regular grid
TODO
12.6.2 2d data on a regular grid
TODO