Neural network techniques are increasingly used in computer graphics to boost image quality, improve performance, and streamline content creation. Approaches like super resolution, denoising, and neural rendering help real-time engines work more efficiently, offering new creative possibilities while keeping performance in mind.
Unreal Engine 5 (UE5) has taken several steps in this direction with the introduction of the Neural Network Engine (NNE), which serves as an abstraction layer that unifies inference workloads across multiple backends. Developers can use various runtimes on a GPU or fall back to a CPU depending on available hardware for seamless integration of neural network features in real-time graphics workflows.
This blog post covers the new plugin that adds NVIDIA TensorRT for RTX as an NNE runtime option (NNERuntimeTRT) for efficient inferencing on NVIDIA RTX GPUs. To show its benefits, I’ll use a simplified UE project that runs a post-process AI model to highlight gains over other GPU runtimes, like DirectML.
First, let’s briefly discuss the different components involved in the project.
TensorRT for RTX Overview
TensorRT for RTX enables users to deploy AI models on RTX GPUs more efficiently. It uses a Just-In-Time (JIT) optimizer within the runtime to generate inference engines tailored to the user’s GPU. This compilation occurs once on the user’s machine and optimizes the model for their specific hardware.
As a result, TensorRT for RTX can offer higher throughput compared to default execution providers. For example, throughput comparisons across various models show improvements when using TensorRT for RTX versus DirectML, as measured on an NVIDIA GeForce RTX 5090 GPU.
TensorRT for RTX is only compatible with NVIDIA RTX GPUs, from the Turing generation (compute capability 7.5) up to the NVIDIA Blackwell generation (compute capability 10.0).
Unreal Engine neural network engine overview
NNE supports multiple runtimes for invoking inferencing tasks and choosing between CPUs and GPUs. TensorRT for RTX is for GPUs and this overview focuses on NNE GPU runtimes.
NNE can run inference on the GPU, either:
Synchronously from the CPU, requiring memory synchronization. Asynchronously through the Render Dependency Graph (RDG), aligning with frame rendering.The synchronous method works well for editors and event-based inference tasks like LLMs, where copying data between host and device is not a concern. In contrast, RDG ties model evaluation to rendering resources, making it ideal for AI post-processing, upscaling, or denoising.
The NNE TensorRT for RTX plugin supports both GPU and RDG methods, offering flexibility for various AI applications such as rendering, animation, language, and speech while maintaining strong performance on consumer-grade devices.
The style transfer post-processing sample project
I built a basic UE5 project to test the NNE TensorRT for RTX plugin, which applies style transfer models during post-processing. For testing, I set up a simple level using a few basic primitives and a fixed camera to keep the visuals consistent while switching between DirectML and TensorRT, making it easier to compare both results and performance.
Prerequisites
While the project is nearly ready for use, having experience with UE5, post-process materials, and engine source compilation is recommended.
To run style transfer inference and manage rendering resources, you need an NNE implementation. UE5 offers this through the Neural Post-Processing plugin, which performs inference on the GPU through RDG. For this project, we’ll utilize the NNERuntimeTRT RDG method.
All that’s required is to train or download an appropriate style model. We’ll use a pre-trained one from the ONNX zoo.
The project already includes an imported model (candy-9-720.uasset).
Project setup
Although the NNE TensorRT for RTX plugin is compatible with the 5.7 binary engine release available from the launcher, the neural post-processing plugin contains a hard-coded list of runtimes within its neural profile asset. It’s necessary to update its code to include the NNERuntimeTRT in the list of available runtimes for the neural profile asset.
Get started:
Get the engine source from GitHub.If it is your first time accessing the engine code base, you need to link your GitHub account with your Epic account. Review this document, which explains the process for accessing the engine code base:
After initial engine setup, you should have a Visual Studio solution. Open the solution and locate the neuralprofile.h/cpp files under: Engine\Source\Runtime\Engine\Classes\Engine In neuralprofile.h, add NNERuntimeTRT to ENeuralProfileRuntimeType (line 59) like so:4. The full enum class should look like this:
5. In neuralprofile.cpp, find the GetNeuralProfileRuntimeName and add to the kRuntimeNames array (line 28):
6. The full kRuntimeNames array should be:
It’s also possible to place the plugin in the project, but for our case, it’s simpler to keep it under the engine plugins. Build the engine. Use these detailed instructions for compiling the engine. The first compilation takes time, so consider taking a break or grabbing a coffee. Get started with the sample project. Clone the sample project repository. Load the project with the compiled engine and play the test level (LVL_PPStyleTest).
Figure 1. The sample project showing the stylized neural post-processing effectPerformance profiling
While playing the test level, activate the engine stats and alternate between TensorRT for RTX (TRT) and DirectML (DML) to assess performance improvements. For more comprehensive profiling, use Unreal Insights to capture frame information and breakdowns for both DML and TRT.
On a system with NVIDIA GeForce RTX 5090 GPU at 1080p, in Unreal Insights, DML required 5.7 ms, whereas TRT completed in 3.8 ms—a 1.5x performance improvement.
Figure 2. Unreal Insights showing the duration of the DirectML enqueue task (5.7 ms) within the neural post-processing stage
Figure 3. Unreal Insights showing the duration of the TensorRT for RTX enqueue task (3.8 ms) within the neural post-processing stageUsing other style transfer models
You can use any style transfer models from the ONNX zoo or train your own. Note that ONNX zoo models have fixed dimensions of 1x3x224x224 for both input and output tensors. The neural process plugin enables tiling for small models to larger frame buffers. While this gives visually acceptable results, it’s not recommended as it spawns multiple inference tasks per frame, causing frequent context switches between NVIDIA CUDA and graphics. To avoid the extra overhead of multiple context switches, change the model dimensions to 1x3x720x720 to run the inference without tiling while maintaining good visual quality.
In the sample project repo, I’ve included a Python script that resizes the input and output tensor dimensions for style transfer ONNX models.
For more information on building your own AI applications with NNE, consult the official documentation
.png)
2 hours ago
English (United States) ·
French (France) ·