6. Reparameterizations of time and space
So far we’ve been looking at adding Gaussian noise with variance
We showed that the function
The setup
Song, Jiaming and Meng, Chenlin and Ermon, Stefano
arXiv preprint arXiv:2010.02502, 2020
Elucidating the design space of diffusion-based generative models
Karras, Tero and Aittala, Miika and Aila, Timo and Laine, Samuli
Advances in Neural Information Processing Systems, 2022 formulation, where the trajectories given by
Song, Yang and Sohl-Dickstein, Jascha and Kingma, Diederik P and Kumar, Abhishek and Ermon, Stefano and Poole, Ben
arXiv preprint arXiv:2011.13456, 2020 variance exploding refers to a specific formulation.)

The linear noise schedule that we have so far been using yields approximately linear trajectories at high noise levels, which in turn yields low integration errors when using a first order Euler integrator, as observed in EDM.
Sometimes other formulations are useful.
When doing numerical computations, as well as feeding quantities into neural networks, it is useful for the vector components to have typical magnitude around
For notational convenience, let
In the EDM / DDIM formulation
where
Show that
It can also be useful to reparameterize time.
For reasons of analysis and numerics, it is also sometimes useful for the range of time to be
We can achieve this via a monotonic reparameterization of time
Different space and time scalings are in some sense all equivalent (the underlying spaces are just transformations of each other); in practical implementations, the inputs to the neural network will be scaled to have norm 1, whether this happens “within the diffusion process”
-
Adjusting the scaling of the coordinates or time can affect the curvature of trajectories, which has an effect when using a first-order Euler method to integrate. There are ways of counteracting this, but it requires consideration; naïvely using a space with non-trivial curvature can significantly affect the performance when using the wrong numerical integration approach.
-
The numerics (due to limited floating point precision) are affected, and this can be a desired or undesired effect effect.
The variance preserving formulation can be easier to debug when implementing it, because vectors with the incorrect norm are more obvious (and less upscaling and downscaling of vectors is required), but it does require more complexity elsewhere, and the trajectory curvature has to be considered when using first-order methods.
How do the drift terms change under reparameterizations of time and space?
In the following,
-
Let
be a reparameterization of time with inverse , so . Show thatUse this, and the result of Exercise 4.2 to deduce that a drift for a Gaussian with standard deviation
at time is -
Let
be rescaled coordinates for some time-dependent rescaling function . Show thatWhat is the time-indexed probability density for
in terms of that for ? -
Let
be a reparameterization of time with inverse , and a rescaling of coordinates by . Show thatWhat is the time-indexed probability density for
in terms of that for ?Confirm also that this formula yields
for the drift of a Gaussian with standard deviation at time , independent of .
The quantity
Recall Exercise 5.5 on the numerics of integration and floating point precision. Is it possible to find a schedule
Let
Euler integration across timesteps
-
What is the curvature (as per Exercise 3.2) of the trajectories
for fixed ? -
Show that no matter what the time steps
, the final value of is zero.
Now consider the time reparameterization where
-
What is
? What do the trajectories look like? -
If we integrate across time steps
, what is the value of in terms of ? -
What is the final integration error
?
Rectified flow
We can extend some of our results from calculating the drift of a convolved distribution, in particular Gaussians, to more general time-indexed families of distributions. One such example is rectified flow [19]Flow straight and fast: Learning to generate and transfer data with rectified flow
Liu, Xingchao and Gong, Chengyue and Liu, Qiang
arXiv preprint arXiv:2209.03003, 2022, which interpolates between two distributions.
Let
Then the minimizer
Theorem 6.1 can be derived from Theorem 4.2 through an appropriate rescaling of coordinates. In some sense, it is “only” a rescaling of the diffusion setup, however, this is a canonical scaling that enjoys certain properties, and in particular has low curvature when doing Euler integration.
Let
i.e., rectified flow. (In general,
There are even more general forms of Theorem 6.1, for example flow matching [20]Flow matching for generative modeling
Lipman, Yaron and Chen, Ricky TQ and Ben-Hamu, Heli and Nickel, Maximilian and Le, Matt
arXiv preprint arXiv:2210.02747, 2022 which in turn has connections to things like optimal transport (not covered here).
Table of contents
- Home
- Motivating example
- Stochastic processes
- Probability flow
- Deterministic diffusion for Gaussian noise
- Numerical integration
- Reparameterizations of time and space
- Stochastic calculus
- Diffusion via SDEs, and score functions