Building Custom Atomistic Simulation Workflows for Chemistry and Materials Science with NVIDIA ALCHEMI Toolkit

SOURCE | 1 month ago

✨ Enhance your Social Media content with NViNiO•AI™ for FREE

For decades, computational chemistry has faced a tug-of-war between accuracy and speed. Ab initio methods like density functional theory (DFT) provide high fidelity but are computationally expensive, limiting researchers to systems of a few hundred atoms. Conversely, classical force fields are fast but often lack the chemical accuracy required for complex bond-breaking or transition-state analysis.

Machine learning interatomic potentials (MLIPs) have emerged as the bridge, offering quantum accuracy at classical speeds. However, the software ecosystem is a new bottleneck. While the MLIP models themselves run on GPUs, the surrounding simulation infrastructure often relies on legacy CPU-centric code.

NVIDIA ALCHEMI (AI Lab for Chemistry and Materials Innovation) helps to address these challenges by accelerating chemicals and materials discovery with AI. We have previously announced two components of the ALCHEMI portfolio:

ALCHEMI NIM microservices: Scalable, cloud‑ready microservices for AI-accelerated batched atomistic simulations in chemistry and materials science ALCHEMI Toolkit-Ops: A set of foundational GPU kernels designed to accelerate the calculations behind simulations, such as neighbor lists, dispersion corrections, and electrostatics

Today, we are introducing the NVIDIA ALCHEMI Toolkit, a collection of GPU-accelerated simulation building blocks that incorporates and expands on ALCHEMI Toolkit-Ops. ALCHEMI Toolkit is designed to manage the data flow between accelerated chemistry and materials domain-specific kernels and deep learning models. ALCHEMI Toolkit extends beyond individual models and kernels to provide a modular, PyTorch-native structure for researchers and developers to compose custom simulation workflows.

Figure 1 shows the ALCHEMI architectural stack and product features supported in this initial release of ALCHEMI Toolkit, including expanded functionality in Toolkit-Ops. This release includes capabilities for geometry relaxation and molecular dynamics, and the supporting pipeline infrastructure for combining multiple simulation workflows.

Graphic of ALCHEMI architectural stack (left) with list of product features supported in the initial release of ALCHEMI Toolkit (right).

Figure 1. NVIDIA ALCHEMI Toolkit is a collection of GPU-accelerated simulation building blocks to enable large-scale, batched simulations with AI

ALCHEMI Toolkit is not just a collection of scripts. It’s designed to enable researchers and developers to build custom, performant atomistic simulation workflows with ease.

Expanding ALCHEMI Toolkit-Ops

ALCHEMI Toolkit leverages the capabilities of Toolkit-Ops to handle the underlying calculations of the simulations. The previous release included several key operations:

Neighbor list constructions DFT-D3 dispersion corrections Long-range electrostatic interactions

This release broadens the scope of common operations addressed to include:

Batched dynamics kernels JAX support (for v0.2.0 release features)

Integration with the atomistic simulation ecosystem

ALCHEMI Toolkit is designed to integrate seamlessly with the broader atomistic simulation ecosystem. We’re excited to announce the following integrations with leading platforms in the chemistry and materials science community.

Orbital

Orbital develops advanced AI foundation models used to accelerate the discovery of novel cooling systems for data centers and sustainable materials. Orbital has integrated ALCHEMI Toolkit into their new OrbMolv2 model to drastically reduce the time required for inference. The new model will leverage ALCHEMI Toolkit components such as PME electrostatics for periodic Coulomb interactions and the MTK integrator for batched constant-pressure molecular dynamics. The existing Orb models already leverage Toolkit-Ops for GPU-accelerated graph construction, providing a ~1.7x acceleration for large systems and ~33x for batched smaller systems with TorchSim support.

Materials Graph Library (MatGL)

MatGL is an open source framework for state-of-the-art graph-based MLIPs. ALCHEMI Toolkit is integrating with the MatGL TensorNet model to significantly accelerate materials simulations and property predictions workflows. By leveraging ALCHEMI Toolkit GPU-native kernels and batching infrastructure, MatGL users can achieve higher computational efficiency and lower memory consumption for simulations at scale.

Matlantis

Matlantis enables rapid materials discovery by combining universal MLIPs with high-performance cloud computing. Matlantis is actively exploring the ALCHEMI Toolkit and identifying where its composable dynamics can deliver the greatest value for industrial materials simulation customers. This builds on its proven integration of ALCHEMI Toolkit-Ops—including Warp-optimized neighbor list construction and DFT-D3 dispersion corrections—which significantly reduces computational overhead of atomistic interactions with speedups of up to 10x.

Furthermore, by evaluating specific components within ALCHEMI Toolkit, this collaboration has the potential to enable Matlantis to move beyond single-structure optimization to high-throughput, parallel relaxation of millions of molecular configurations. Ultimately, this integration aims to further power small-scale research and industrial-scale materials design, accelerating chemical evaluation with unparalleled GPU efficiency.

How to get started with ALCHEMI Toolkit

This section walks you through how to get started with ALCHEMI Toolkit, which is straightforward and designed for ease of use.

System and package requirements

Python ≥3.11, <3.14 PyTorch ≥2.8 CUDA Toolkit 12+, NVIDIA driver 470.57.02+ Operating System: Linux (primary), macOS NVIDIA GPU (RTX 20xx or newer), CUDA Compute Capability ≥ 7.0 Minimum 4 GB RAM (16GB recommended for large systems)

Installation

Use the following code to install ALCHEMI Toolkit:

# Install Atomic Simulation Environment (ASE, used in the examples below) uv pip install ase # Using pip pip install nvalchemi-toolkit # Using uv uv venv --seed --python 3.12 uv pip install nvalchemi-toolkit # Install from source git clone https://github.com/NVIDIA/nvalchemi-toolkit.git cd nvalchemi-toolkit uv sync --all-extras # Add nvalchemi as a project dependency uv add nvalchemi-toolkit

For more information, reference the NVIDIA/nvalchemi-toolkit GitHub repo and the ALCHEMI Toolkit documentation.

This section dives into four core ALCHEMI Toolkit features: customizable batched simulation workflows, build-your-own dynamics classes, model wrappers, and advanced data management. These features provide researchers and developers with the tools and flexibility needed to create bespoke end-to-end workflows that maximize efficiency and performance on NVIDIA GPUs.

Customizable batched simulation workflows

The distinctive feature of the NVIDIA ALCHEMI Toolkit is the GPU-native batched dynamics engine. No single MLIP model is perfect for every chemical environment, especially when dealing with nonlocal, long-range interactions.

ALCHEMI Toolkit enables researchers to combine modular chemistry and materials science domain-specific kernels and models into customized simulation workflows. This architecture supports the development of specialized compute workflows and running virtual laboratories with millions of concurrent atomic interactions without the latency of traditional software stacks.

Capabilities

Composable calculators combining MLIPs with physics-based corrections High-performance wrappers (MACE, TensorNet, AIMNet2)

API example

The following example constructs the data, sets up the MLIP, and configures a FIRE2 geometry optimization that is then used as a starting point for velocity Verlet (microcanonical) dynamics:

from ase import Atoms from nvalchemi.data import AtomicData, AtomicBatch from nvalchemi.dynamics import ConvergenceHook from nvalchemi.dynamics.optimizers import FIRE2 from nvalchemi.dynamics.integrator import VelocityVerlet # setup some batch of atomic structures atomic_data = [AtomicData.from_atoms(Atoms(...), device="cuda") for _ in range(16)] batch = Batch.from_data_list(atomic_data) # setup your MLIP and dynamics classes mlip = ... # optimizer convergence depends on the force norm and max values conv_criteria = ConvergenceHook( criteria=[ {"key": "forces", "threshold": 0.05, "reduce_op": "norm"}, {"key": "forces", "threshold": 0.1, "reduce_op": "max"} ] ) optimizer = FIRE2( mlip, convergence_hook=conv_criteria, n_steps=200 ) velverlet = VelocityVerlet(mlip, n_steps=1000)

You can run and scale the simulation pipelines in one of two ways: on a single GPU or on across multiple CPUs and GPUs.

Run and scale the pipeline on a single GPU: The FusedStage class is formed by “adding” two or more dynamics objects together. This enables wrapping the end-to-end workflow in torch.compile and sharing CUDA stream contexts.

fused = optimizer + velverlet # context manager handles compilation and CUDA stream with fused: # runs 200 steps of optimization and 1000 steps of MD fused.run(batch)

With this approach, you can easily build simulation workflows that run sequential steps as samples within the batch converge immediately, and make optimal use of your GPU.

Run and scale the pipeline across multiple CPUs and GPUs: The second approach is to distribute the pipeline across multiple CPUs/GPUs. Using the pipe operator on two dynamics classes will then distribute the FIRE2 optimization onto one GPU, and the velocity Verlet integration on another.

pipeline = optimizer | velverlet # equivalent to manual allocation with explicit producer/consumer # optimizer.next_rank = 1, velverlet.prior_rank = 0 # DistributedPipeline({0: optimizer, 1: velverlet}) with pipeline: pipeline.run(batch)

While this example is deliberately simplified for illustrative purposes, such abstraction allows users to scale their pipeline up to multiple GPUs on a node, and out to multiple nodes to arbitrarily large datasets and number of ranks.

The following example configures eight GPUs to run geometry optimization, which pipelines the results to run Langevin dynamics on another eight GPUs:

from torch import distributed as dist from torch.utils.data.distributed import DistributedSampler from nvalchemi.data.datapipes import Dataset, DataLoader # set up distributed; torchrun --nproc-per-node 8 --nnodes 2 ... dist.initialize_process_group() # set up data and distributed sampler dataset = Dataset(...) data_sampler = DistributedSampler( dataset, num_replicas=dist.get_world_size(), rank=dist.get_rank() ) loader = DataLoader( dataset, batch_size=128, sampler=sampler, use_stream=True ) # configure your pipeline; 8 ranks do optimization, 8 do langevin dynamics optimizers = [FIRE2(mlip, ..., next_rank=index + 8) for index in range(8)] dynamics = [Langevin(mlip, ..., prior_rank=index) for index in range(8)] pipeline = DistributedPipeline( {index: stage for index, stage in enumerate(optimizers + dynamics)} ) with pipeline: for batch in loader: pipeline.run(batch)

Build-your-own dynamics classes

ALCHEMI Toolkit offers a modular architecture to build and customize dynamics classes from the ground up. This approach enables the community to integrate new sampling methods or thermodynamic ensembles into the ALCHEMI environment while maintaining direct access to underlying kernels. This transforms dynamics into a fully customizable environment where users can construct specialized dynamics classes from scratch.

Capabilities

Specialized GPU-first trajectory analysis tools Integrated and customizable dynamics kernels (Velocity Verlet, NPT, Langevin thermostats) FIRE and FIRE2 optimizers

API example

from enum import Enum import torch from nvalchemi.data import Batch from nvalchemi.dynamics.base import BaseDynamics, DynamicsStage from nvalchemi.hooks import Hook, HookContext class MySimulatedAnnealer(Hook): def __init__( self, t_start: float, t_end: float, cooldown_steps: int, frequency: int, stage: DynamicsStage ) -> None: # this hook will fire off every `frequency` MD steps, # bringing the temperature from `t_start` to `t_end` self.frequency = frequency self.t_start = t_start self.t_end = t_end self.cooldown_steps = cooldown_steps self.stage = DynamicsStage.BEFORE_STEP self.decay = (t_end / t_start) ** (1.0 / cooldown_steps) self._current_temp = t_start def __call__(self, ctx: HookContext, stage: Enum) -> None: # access the calling dynamics class through `HookContext` dynamics = ctx.workflow dynamics.target_temperature = max( dynamics.target_temperature * self.decay, self.t_end ) class VelocityVerlet(BaseDynamics) __needs_keys__: {"energies", "forces", "masses", "velocities"} __provides_keys__: {"positions"} def __init__( self, model: BaseModelMixin, n_steps: int, dt: float = 1.0, # timestep target_temperature: float = 300.0, # initial temperature tau: float = 10.0, # coupling constant hooks: list[Hook] | None = None, convergence_hook: ConvergenceHook | dict | None = None, **kwargs, ): super().__init__(model=model, n_steps=n_steps, hooks=hooks, convergence_hook=convergence_hook) self.dt = dt self.target_temperature = target_temperature self.tau = tau self._prev_accelerations = None def pre_update(self, batch: Batch) -> None: # perform the first half of velocity Verlet with torch.no_grad(): accelerations = batch.forces / batch.masses self._prev_accelerations = accelerations.clone() batch.positions.add_( batch.velocities * dt + 0.5 * accelerations * dt**2.0 ) def post_update(self, batch: Batch) -> None: # perform second half of velocity Verlet, with thermostat # temperature update with torch.no_grad(): new_accelerations = batch.forces / batch.masses batch.velocities.add_(0.5 * (self._prev_accelerations + new_accelerations) * self.dt) ke_per_atom = 0.5 * batch.masses * (batch.velocities**2).sum(dim=-1, keepdim=True) # get the total kinetic energy per system total_ke = scatter_add_(...) current_temp = 2.0 * total_ke / (batch.num_atoms * 3.0) ratio = self.target_temperature / current_temp lam = torch.sqrt( torch.tensor(1.0 + (self.dt / self.tau) * (ratio - 1.0)) ).clamp(min=0.8, max=1.2) # clamp for stability batch.velocities.mul_(lam) # configure the new dynamics class my_velverlet = VelocityVerlet( ..., hooks=[ MySimulatedAnnealer(t_start=900.0, t_end=300.0, cooldown_steps=10, frequency=100, stage=DynamicsStage.BEFORE_STEP) ], )

Model wrappers

With ALCHEMI Toolkit, you can use your own pretrained models with accelerated physics components. It provides the essential infrastructure for importing your own models into the pipeline, ensuring that proprietary or domain-specific architectures can leverage GPU-native orchestration. This abstracts the complexity of different model types, providing a standardized path to move from a standalone model to a production-ready, high-throughput simulation.

Capabilities

MLIP support (MACE, TensorNet, AIMNet2) Composable calculators Standardized model configuration

API example

from beartype import beartype from super_mlip import BestMLIPModel from nvalchemi._typing import ModelOutputs from nvalchemi.models.base import BaseModelMixin, ModelConfig, NeighborConfig class BestMLIPWrapper(nn.Module, BaseModelMixin): def __init__(self, model: BestMLIPModel, **kwargs): super().__init__(**kwargs) # ModelConfig declares model capabilities (which are frozen) # and runtime control (mutable) for the rest of the framework self.model_config = ModelConfig( outputs=frozenset({"energy", "forces", "hessians"}), # this is actually the default value required_inputs=frozenset({"positions", "atomic_numbers"}) autograd_outputs=frozenset({"forces"}), neighbor_config=NeighborConfig(cutoff=5.0, format="coo") ) def adapt_input(self, data: Batch, **kwargs) -> dict[str, Any]: # adapts the nvalchemi data structure to what is # expected by the model model_inputs = super().adapt_input(data, **kwargs) # dict structure expected by BestMLIPModel model_inputs["atom_numbers"] = data.atomic_numbers model_inputs["coords"] = data.positions return model_inputs def adapt_output(self, model_output: any, data: Batch) -> ModelOutputs: # adapt the model outputs from the model's forward pass to # format expected by nvalchemi output = super().adapt_output(model_output, data) energies = model_output["energies"] output["energies"] = energies # check model config for expected outputs if "forces" in self.model_config.active_outputs: output["forces"] = model_output["forces"] return output # beartype decorator is optional, but will runtime type check arguments @beartype def forward(self, data: Batch, **kwargs) -> ModelOutputs: model_inputs = self.adapt_input(data, **kwargs) # calls BestMLIPModel's forward definition based on MRO model_outputs = super().forward(**model_inputs) return self.adapt_output(model_outputs, data)

Advanced data management

Traditionally, the “memory tax” of moving data between the CPU and GPU is a significant bottleneck in AI-driven discovery. ALCHEMI Toolkit acts as the specialized orchestrator for scientific data, providing the infrastructure required to build custom ingestion pipelines to move information from standard research files into optimized GPU tensors.

This supports discovery to scale, making industrial-scale simulations accessible through familiar interfaces. By standardizing how atomic information is represented and loaded, ALCHEMI Toolkit ensures that data remains resident on the device, meaning the entire simulation stays on the GPU, enabling batched simulations for optimization of GPU utilization and eliminating communication overhead.

Capabilities

High-performance data loaders ASE and Pymatgen interface AtomicData and batch objects

API example

from nvalchemi import AtomicData, Batch from nvalchemi import data from ase.build import slab atoms = slab(...) # Create AtomicData object from ase.Atoms object data = AtomicData.from_atoms(atoms, device="cuda") >>> data ... data.node_properties data.system_properties # Create a Batch object from a list of AtomicData batch = Batch.from_data_list([data, data, data]) batch.num_graphs batch.get_data(0) # get first three samples batch[:2] batch[mask] batch["energies"] -> ... batch.from_atoms([ase.Atoms,...]) # Create a dataset from ase.Atoms writer = data.AtomicDataZarrWriter("atom_dataset.zarr") # writer will amortize overhead by writing batches of data; # this is equivalent to writing individual samples but efficiently writer.write(batch) # Read the data from zarr reader = data.AtomicDataZarrReader("atom_dataset.zarr") # Dataset treats device natively; individual samples # are placed on GPU and it accelerates preprocessing transforms; # num_workers sets the number of threads used for async prefetching dataset = data.Dataset(reader, device = "cuda", num_workers=4) dataloader = data.DataLoader(dataset, batch_size=16) for batch in dataloader: # do something with batch

Get started building molecular workflows with ALCHEMI Toolkit

ALCHEMI Toolkit provides researchers and developers with the low-level primitives and high-level abstractions needed to build end-to-end, GPU-native molecular workflows. Moving critical bottlenecks—such as neighbor list construction, structural relaxation, and integration steps—into the PyTorch ecosystem eliminates the host-to-device memory transfer overhead that has traditionally throttled MLIP-driven simulations.

Whether you’re composing hybrid ML or physics potentials or scaling batched molecular dynamics, ALCHEMI Toolkit exposes the necessary API hooks to manage complex tensorized states without sacrificing performance.

To accelerate your chemistry and materials science simulations and explore building your own custom workflows, visit the NVIDIA/nvalchemi-toolkit GitHub repo and ALCHEMI Toolkit documentation. As we continue to expand the library of supported operations and architectures, we encourage you to clone the repository, explore the provided Jupyter notebooks, and begin integrating these GPU-accelerated workflows into your own discovery pipelines.

Acknowledgments

We’d like to thank James Gin, Tim Duignan, Vaidas Šimkus of Orbital; Professor Shyue Ping Ong of MatGL; Susumu Ohno, Ryuhei Okuno, Jethro Tan of Matlantis for working with us to adopt NVIDIA ALCHEMI Toolkit into their platforms. We would also like to thank Nikita Fedik, Roman Zubatyuk, Atul Thakur, and Logan Ward for their contributions to this post.

✨ Enhance your brand's digital communication with NViNiO•Link™ : Get started for FREE here