Synthesize Realistic 3D Medical Images at Scale to Ship Pre‑Trained Models

SOURCE | 2 months ago

✨ Enhance your Social Media content with NViNiO•AI™ for FREE

High‑quality 3D medical imaging data is the foundation of modern radiology AI, but access to it is often constrained by data scarcity, privacy restrictions, and the high cost of expert annotation. As a result, training reliable 3D medical imaging models is frequently bottlenecked by small, narrow, and hard‑to‑share datasets, limiting model robustness and generalization.

To help teams overcome these challenges, NVIDIA introduced Medical AI for Synthetic Imaging (MAISI) in 2024—a state‑of‑the‑art generative model that synthesizes high‑resolution 3D CT volumes with pixel‑level anatomical segmentation for privacy‑preserving data augmentation and research.

NV-Generate-CTMR, built on the MAISI architecture family, including MAISI‑v2 with Latent Rectified Flow, delivers an open source, end-to-end framework for synthetic CT and MRI generation. It enables researchers and developers to generate realistic 3D volumes and paired segmentations at scale, integrate them directly into training pipelines, and accelerate downstream medical imaging AI development.

This blog post introduces NV-Generate-MR-Brain, a new model for the synthetic generation of human brain anatomy and structure segmentation built on the MAISI architecture and extending it toward scalable, open workflows for synthetic 3D medical imaging generation.

Figure shows Generated MR images using NV-Generate-CTMR rflow-mr model

Figure 1. Generated MR images using NV-Generate-CTMR rflow-mr model. The image on the left is a generated T2w prostate MRI; the image on the right is generated T1w brain MRI

Breaking the 3D medical imaging data bottleneck

NV-Generate-MR-Brain was trained on the newly released multimodal MR-RATE dataset from University of Zurich, Medipol University Hospital, Forithmus and NVIDIA. The MR-RATE dataset builds on the highly successful CT-RATE dataset and multimodal foundation models.

MR-RATE, the world’s largest open source multimodal MRI dataset, comprises 100,000 brain MRI studies from more than 83,000 patients—totaling about 700,000 volumes—each paired with de‑identified radiology reports, clinical, and scanner Digital Imaging and Communications in Medicine (DICOM) metadata. The dataset was created to establish an open, large‑scale foundation for developing both research and commercial AI systems that understand both imaging and clinical context.

MR‑RATE captures the diversity of real‑world neuroimaging practice, spanning different scanner types, imaging protocols, and neurological pathologies. The MR-RATE dataset is being released with an open CC-BY-NC license for research institutions with commercial licenses available through Forithmus.

Image shows a novel dataset of brain and spine volumes from MR-RATE alongside a radiology report that the radiologist can review

Figure 2. MR-RATE is a novel dataset of brain and spine MRI volumes with a corresponding radiology report

Open source by design

The repository includes end‑to‑end inference code, pretrained weights, and training configurations, enabling teams to get started immediately without rebuilding complex pipelines from scratch. Users can generate synthetic images out of the box or fine‑tune the models on their own datasets to adapt to new anatomies, scanners, or modalities—significantly lowering both technical and compute barriers.

For this project all of the ingredients including code, data, and models are released with open source licenses with most models being released under the NVIDIA Open Model License. Inferencing for these models can be run on NVIDIA RTX GPUs royalty-free to generate images, fine-tune the model on new data, or new use cases.

Why image generation is essential for medical AI

Medical image synthesis has rapidly become a core capability for medical AI development. Teams use synthetic data to augment limited training sets, translate between imaging modalities such as CT and MRI, simulate rare pathologies, and enable privacy‑preserving data sharing without exposing real patient information.

By generating realistic, anatomically consistent 3D volumes—often paired with segmentation labels—synthetic data helps models generalize better when labeled examples are scarce and enables consistent benchmarking across sites, scanners, and protocols.

As clinical imaging becomes increasingly personalized, heterogeneous, and multimodal, scalable and controllable generation frameworks are no longer optional—they are essential for building robust medical AI systems.

Limitations of existing medical image synthesis approaches

Over the years, medical image synthesis methods have largely fallen into three categories: direct regression models, generative adversarial network (GAN)‑based approaches, and, more recently, diffusion models that generate images through iterative denoising.

Among these, diffusion models have emerged as the state of the art, offering improved stability and the ability to model complex anatomical distributions. However, applying diffusion models in real clinical workflows remains challenging.

First, real‑world medical images vary widely across scanners, acquisition protocols, and voxel spacings, making it difficult for models trained on narrow datasets to generalize.

Second, CT and MRI are inherently 3D modalities, yet full 3D diffusion models are computationally expensive in both time and GPU memory.

Third, even when conditioning signals—such as masks or anatomy hints—are provided, generated outputs may not faithfully follow those inputs, limiting their usefulness for controlled or task‑specific generation.

Together, these challenges—limited generalization, high computational cost, and weak condition alignment—make many existing approaches difficult to deploy at scale, motivating the need for faster and more controllable 3D synthesis frameworks.

Fast, Open Source 3D Medical Image Synthesis

NV‑Generate‑CTMR is an open‑source framework from NVIDIA designed to make high‑quality 3D medical image synthesis practical for everyday research and development. Rather than treating generative modeling as a narrow, task‑specific solution, it provides a reproducible, ready‑to‑use platform for creating realistic CT and MRI volumes across a wide range of clinical scenarios.

The framework is the first open‑source medical image generation framework to support flexible voxel sizes, variable volume dimensions, and whole‑body coverage within a single model (shown in Figure below).

This flexibility allows researchers to synthesize data that matches real clinical protocols—from small, cropped regions to full‑resolution, large‑field‑of‑view scans—without retraining separate models for each setting. In this sense, NV‑Generate‑CTMR behaves as a foundation model for medical imaging, adaptable to many downstream tasks and anatomies rather than being limited to a single organ or configuration.

Figure shows generated CT images aren’t limited to a single organ or configuration

Figure 3: NV-Generate-CTMR image generation results from rflow-ct model for variable voxel and volume sizes in three different anatomical regions

Efficient, sustainable AI development

By sharing models and training details openly, NV‑Generate‑CTMR follows the same philosophy as other open‑source foundation models: reuse instead of retrain.

Fine‑tuning an existing model is faster and far more energy‑efficient than training from scratch, reducing development time, lowering electricity consumption, and shrinking environmental impact.

Under the hood

NV‑Generate‑CTMR contains two model architectures:

MAISI‑v1, based on Latent Denoising Diffusion Probabilistic Models (DDPM) for stochastic image generation with better diversity MAISI‑v2, based on Latent Rectified Flow, for 33x acceleration in inference speed and image generation with better quality

Details were published in two technical papers: MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-Specific Contrastive Loss at the AAAI Conference on Artificial Intelligence in 2026; and MAISI: Medical AI for Synthetic Imaging at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) in 2025.

Fast inference at scale

The MAISI‑v2 model in NV‑Generate‑CTMR achieves state‑of‑the‑art image quality with much faster inference compared to prior medical image generation methods, while delivering inference speeds comparable to leading video generation models. Table 1, below, breaks down the family of NV-Generate models.

Model Name	ddpm-ct	rflow-ct	rflow-mr	NV-Generate-MR-Brain
Modality	CT	CT	MR	MR
Release date	August 2024	March 2025	October 2025	March 2026
Body Region	Whole body	Whole body	Brain, prostate, abdomen, breast	Whole brain, skull-stripped brain (user can specify)
Architecture	MAISI-v1	MAISI-v2	MAISI-v2	MAISI-v2
Inference step	1000	30	30	30
Max Volume	512x512x768	512x512x768	512x512x128	512x512x256
Use case	Image-only generation; image/mask pair generation	Image-only generation; image/mask pair generation	Image-only generation	Image-only generation; cross-contrast generation
Advantages	Better image diversity, whole body coverage	Fast inference speed, better image quality, whole body coverage	Fast inference speed, multiple body region coverage	Fast inference speed, better image quality for brain region
License	Open source, Commercial friendly	Open source, Commercial friendly	Open source, Research Only	Open source, Commercial friendly

Table 1. NV-Generate family of models

Multi-contrast generation model for brain MRI

Brain MRI is one of the most widely used applications of magnetic resonance imaging. To support this domain, we released NV-Generate-MR-Brain, a generative model built on the MAISI‑v2 architecture and trained on the newly released MR‑RATE dataset.

The model is designed for high-fidelity brain MRI synthesis and includes a foundation brain MRI model capable of generating either whole-brain or skull‑stripped images based on user specifications. It supports several widely used sequences and contrasts, including T1‑weighted (T1w), T2‑weighted (T2w), FLAIR, and SWI, enabling realistic and flexible image generation for research and clinical training applications.

The model supports high-resolution volumetric synthesis up to 512 × 512 × 256, approaching the upper range of spatial resolution used in clinical and research brain MRI, enabling realistic full-volume data generation for medical imaging workflows.

In addition, NV-Generate-MR-Brain provides a ControlNet module for generation of specified anatomical structures or cross-sequence synthesis, enabling users to predict one MRI sequence based on another.

Real‑world applications and research adoption

Image–mask pairs with tumors generated by NV‑Generate‑CTMR have been used as augmented training data for NV Segment. Beyond NVIDIA, the models have been used or fine‑tuned by external researchers across a wide range of applications, including:

Zero-shot anomaly detection Lung CT cancer classification Prostate MRI lesion classification MR-to-CT synthesis Text prompted CT and MRI tumor segmentation Brain diffusion MRI tractography Brain tumor MRI synthesis Text-to-CT generation Text-to-brain MRI generation

“Synthetic, anatomically realistic neuro MR data from NV-Generate, combined with automated segmentation from NV-Segment and clinical reasoning capabilities from NV-Reason, help us design and validate AI solutions more efficiently,” said Ioannis Panagiotelis, PhD, Business Leader MR at Philips. “This enables radiologists to benefit from smarter, more explainable workflows without compromising patient privacy.”

Try it yourself: Synthesize 3D medical images

The fastest way to experience NV-Generate-CTMR is to run it yourself.

Online demo: No GPU required, you can explore an interactive browser demo hosted by NVIDIA.

Command Line Interface (CLI): The online demo showcases core capabilities, but the full experience is available from the GitHub repository, which includes pretrained weights and ready-to-use inference scripts for generating complete 3D CT or MRI volumes locally. After cloning the repo and installing dependencies, you can launch inference with a single command:

git clone https://github.com/NVIDIA-Medtech/NV-Generate-CTMR.git cd NV-Generate-CTMR export MONAI_DATA_DIRECTORY="./temp_work_dir" network="rflow" generate_version="rflow-ct" python -m scripts.inference \ -t ./configs/config_network_${network}.json \ -i ./configs/config_infer.json \ -e ./configs/environment_${generate_version}.json \ --random-seed 0 \ --version ${generate_version}

This command loads the pretrained rectified flow model and synthesizes full 3D medical volumes directly to your local workspace. You can then visualize the outputs, inspect paired segmentation masks, or plug the generated data into your own training and evaluation pipelines. An example result of the code block above is shown in Figure 4, below.

Example results

An example of a typical CT image generated from mask condition, showing how this pair of image and mask are well aligne

Figure 4. An example of a typical CT image generated from mask condition

Whether you’re testing ideas, augmenting datasets, or benchmarking models, NV-Generate-CTMR makes it easy to start generating realistic medical images right away.

Video 1. An example of generated CT and MR images