In the earlier section on Stochastic processes, we introduced Brownian motion, and Wiener processes \((W_t)_{t \in [0, +\infty)}\), where informally \(W_{t + dt} = W_t + d W_t\) and \(d W_t \equiv N(0, dt)\) is an infinitesimal Gaussian with variance \(dt\). We can generalize this to a wider class of stochastic processes.

Case study: the heat equation

To introduce some of the tools required later on, let’s derive the heat equation, which describes the time-evolution of the density of a Wiener process \((W_t)_{t \in [0, +\infty)}\). Some of this material is based on [13]Brownian motion and Dyson Brownian motion
Tao, Terence
https://terrytao.wordpress.com/2010/01/18/254a-notes-3b-brownian-motion-and-dyson-brownian-motion
. The heat equation is analogous to Theorem 3.3 (the Continuity Equation), where the deterministic trajectories are replaced by stochastic processes.

Let \((W_t)_{t \in [0, +\infty)}\) be a Wiener process, and let \(F : \R \mapsto \R\) be a smooth function with all derivatives bounded. (This is a slightly stronger restriction than the test functions from \(C_c(\R)\) we have been using; we require some higher derivatives to also be bounded.)

For each time \(t\), the random variable \(F(W_t)\) is bounded and thus has expectation \(\E F(W_t)\). From the almost sure continuity of \(W_t\) (i.e., the map \(t \mapsto W_t\) is continuous with probability \(1\)) and the dominated convergence theorem, we see that the map \(t \mapsto {\E} F(W_t)\) is continuous. In fact it is differentiable, and obeys the following differential equation:

calculate Lemma 7.1: Equation of motion

For all times \(t \ge 0\), we have

\[\begin{equation} \label{eq:brownian_eq_of_motion} \frac{d}{dt} \E F(W_t) = \frac{1}{2} \E F_{xx} (W_t) \end{equation}\]

where \(F_{xx}\) is the second derivative of \(F\). In particular, \(t \mapsto \E F(W_t)\) is continuously differentiable (because the right-hand side is continuous).

construction Proof
+

We work from first principles. It suffices to show

\[\mathbf{E} F(W_{t + dt}) = \mathbf{E} F(W_t) + \frac{1}{2} dt \E F_{xx} (W_t) + o(dt)\]

as \(dt \rightarrow 0\). We shall show this for \(dt > 0\); the case \(dt < 0\) is similar.

Write \(dW_t = W_{t + dt} - W_t\). By Taylor expansion, we have

\[\begin{equation} \label{eq:taylor_expansion_f_on_brownian} F(W_{t + dt}) = F(W_t) + dW_t F_x(W_t) + \frac{1}{2} dW_t^2 F_{xx} (W_t) + O(|dW_t|^3). \end{equation}\]

Since \(dW_t \equiv N(0, dt)\), in taking expectations we have \(\E dW_t = 0\) and \(\E dW_t^2 = dt\) and so

\[\E F(W_{t + dt}) = \E F(W_t) + \frac{1}{2} dt F_{xx} (W_t) + o(dt).\]

The claim follows.

Let \(\rho(x, t)\) be the marginal density of \(W_t\). By the definition of expectation, \(\E F(W_t) = \int \rho(x, t) F(x) dx\). Applying \eqref{eq:brownian_eq_of_motion} and integration by parts, we have

\[\partial_t \rho = \frac{1}{2} \partial_{xx} \rho.\]

A solution to this is

\[\begin{equation} \label{eq:heat_equation} \rho(x, t) = \frac{1}{\sqrt{2\pi t}} e^{-x^2 / 2t}, \end{equation}\]

which is of course unsurprisingly the density of a Gaussian with variance \(t\).

edit Exercise 7.1: Diffusion processes and the heat equation

Our construction of Brownian motion \((B_t)_{t \ge 0}\) so far has the initial position \(B_t \equiv 0\) deterministic. However we can easily construct Wiener processes where the initial position \(X_0\) is itself a random variable. Indeed, set

\[X_t := X_0 + B_t\]

where \(B_t\) is a Wiener process independent of \(X_0\). This obeys properties 2, 3, and 4 of Definition 2.1.

Show that again, unsurprisingly, the density of \(X_t\), which is the convolution of the density of \(X_0\) with a Gaussian, obeys the heat equation \eqref{eq:heat_equation}.

So far, we have worked with 1-dimensional Wiener processes, but there is no difficulty defining a similar process in higher dimensions. In a vector space \(\R^n\), define a (continuous) Wiener process \((W_t)_{t \in [0, +\infty)}\) in \(\R^n\) to be a process whose components \((W_{t,i})_{t \in [0, +\infty)}\) for \(i=1, \ldots, n\) are independent Wiener processes. It is easy to see that such processes exist, but with the \(1\)-dimensional Gaussian \(N(\mu, \sigma^2)_\R\) being replaced by the \(n\)-dimensional Gaussian \(N(\mu, \sigma^2 I)_{\R^n}\).

edit Exercise 7.2: Heat equation in \(n\) dimensions

If \((W_t)_{t \in [0,+\infty)}\) is an \(n\)-dimensional continuous Wiener process, show that

\[\frac{d}{dt} \E F(W_t) = \frac{1}{2} \E (\Delta F)(W_t)\]

whenever \(F: \R^n \rightarrow \R\) is smooth with all derivatives bounded, where

\[\Delta F := \sum_{i=1}^n \frac{\partial^2}{\partial x_i^2} F\]

is the Laplacian of \(F\). Conclude in particular that the density function \(\rho(x, t)\ dx\) of \(W_t\) obeys the (distributional) heat equation

\[\partial_t \rho = \frac{1}{2} \Delta \rho.\]

Stochastic calculus

This subsection is going to be an informal introduction to notation used in stochastic calculus.

We saw the beginnings of stochastic calculus in the derivation from first principles of Lemma 7.1, and we continue this informally in this subsection. In calculating the change in \(F\) from \(t\) to \(t+dt\) in \eqref{eq:taylor_expansion_f_on_brownian}, there were two terms. The first term was proportional to \(dW_t\), which can be thought of as an infinitesimal change in a Brownian process, or equivalently (and informally) a sample from \(N(0, dt)\):

\[dW_t \equiv N(0, dt) \qquad \text{(informally)}\]

The second term was proportional to \(dW_t^2\). In expectation, this is equal to \(dt\), and by the law of large numbers, by considering a smaller subdivision of \(dt\), we can treat this as equal to \(dt\). Thus in the language of stochastic calculus, we can write \eqref{eq:taylor_expansion_f_on_brownian} as

\[\begin{equation} \label{eq:sde_for_brownian} dF(W_t) = F_x (W_t) \, dW_t + \frac{1}{2} F_{xx} (W_t) \, dt. \end{equation}\]

where \(dF(W_t) = F(W_{t+dt}) - F(W_t)\). \(F(W_t)\) is a stochastic process, and follows the stochastic differential equation \eqref{eq:sde_for_brownian}.

A more general stochastic differential equation is of the form

\[\begin{equation} \label{eq:general_sde} dX_t = v(X_t, t) \, dt + \kappa(X_t, t) \, dW_t, \end{equation}\]

where \(v(x, t)\) is the drift (or velocity) coefficient, and \(\kappa(x, t)\) is the diffusion coefficient, which is new. When \(\kappa(x, t)=0\), then the above is just an ordinary differential equation, and integrating it yields the deterministic trajectories. Brownian motion is described by \(v=0\) and \(\kappa=1\).

edit Exercise 7.3: SDEs for marginal distributions

By working from first principles and treating \(dt\) as very small and \(dW_t \equiv N(0, dt)\), or otherwise, determine SDEs (i.e., drift and diffusion coefficients) for the following stochastic processes.

  1. Starting at \(X_0=0\), and with marginal distribution \(X_t \equiv N(0, t^2)\).

  2. Starting at \(X_0=0\), and with marginal distribution \(X_t \equiv N(0, \sigma(t)^2)\) for some monotonically increasing function \(\sigma(t)\), with no drift coefficient, i.e., \(v(x, t) = 0\).

  3. Starting at \(X_0 \equiv N(0, 1)\), and with marginal distribution \(X_t \equiv N(0, 1-t)\), with no diffusion coefficient, i.e., \(\kappa(x, t)=0\).

  4. Starting at \(X_0 = Z\) where \(Z\) is an arbitrary random variable, and with marginal distribution \(X_t \equiv (1-t)Z + N(0, t)\).

All of this again generalizes to higher dimensions. Given a vector space \(\R^d\), equation \eqref{eq:general_sde} generalizes to a vector-valued drift coefficient \(v(x, t) \in \R^n\) and a matrix-valued diffusion coefficient \(\kappa(x, t) \in \R^{n \times n}\).

edit Exercise 7.4: Cumulative variance of Wiener process

In \(n\) dimensions, given a diffusion coefficient \(\kappa(t) \in \R^{n \times n}\), show that the marginal distribution of the stochastic process defined by \(dX_t = \kappa(t) dW_t\) and \(X_0 = 0\) is a Gaussian with mean zero and covariance \(\int_0^t \kappa(s) \kappa(s)^T ds\), where \(\kappa(s)^T\) is the transpose of \(\kappa(s)\).

It is important to note that unlike ODEs, SDEs are not trivially reversible. For an ODE, if you start at \(X_0\), then integrate it to time \(t=T\) to get \(X_T\), and then integrate it back to \(t=0\), then you will end up at \(X_0\). However, for SDEs, adding more Gaussian noise in the reverse direction will not undo the original addition of Gaussian noise.