DDPMs for MRI

Diffusion-Driven Generation of Minimally Preprocessed Brain MRI

Background and Methods

Training a deep generative model for 3D medical images, such as magnetic resonance (MR) images, is challenging. Denoising Diffusion Probabilistic Models (DDPMs) progressively transform noise into a sample from the training distribution and have become a leading approach for high-fidelity image generation. As MR images are frequently acquired in 3D (or stacked 2D slices), a useful generative model needs 3D consistency, which usually means a 3D network. 3D DDPMs, however, are large, data-hungry, and slow.

To train one of these large models, we needed a very large training set of varied, but still high-quality data. We collected 38 publicly available datasets and curated them for high-resolution, high-quality T$_1$-weighted brain images (see our blog post for more details).

We want our models to be useful to as many users as possible, so we avoided common steps used to reduce data complexity, including registration, bias correction, downsampling, and latent compression. Because 3D DDPMs are large models, we tested different prediction methods to determine which were stable and converged relatively quickly. This led us to train a suite of models using different prediction methods and compare their ability to generate realistic 3D brain MRI. DDPMs use a U-Net to predict how to denoise an image, but that can be done in multiple ways. Commonly, networks will either predict the noise ($\hat{\epsilon}$) or the sample (sample, $\hat{x}_0$). But groups have also proposed methods that predict weighted or unweighted sums of the noise and clean image, called velocity ($\hat{\nu}$) prediction and flow ($\hat{\mu}$) matching, respectively. We wanted to try all of these methods to evaluate which would provide the best results.

Results

We found that in our training, noise prediction failed to converge, while sample, velocity and flow models all produced brain-like samples.

Uncurated samples generated by each of our models (sagittal, coronal, and axial slice).
Prediction methods: top left - sample, top right - velocity, bottom - flow.

We also showed that our synthetic images (except for noise prediction outputs) can be segmented by methods like SynthSeg. Their volumes, while not exactly aligned with the testing distribution, were quite close, with velocity and flow prediction methods aligning better than sample results.

Segmentation results from 1000 synthetic images created by sample, velocity and flow prediction methods. Real values are taken from a withheld testing set of subjects from our large-scale dataset.

Paper and Models

This work is currently under review at Scientific Reports. A preprint is available on arXiv.

Code and pre-trained weights for these models are available on GitHub.