HDR-NSFF: High Dynamic Range Neural Scene Flow Fields

ICLR 2026
1KAIST 2POSTECH

HDR-NSFF reconstruct HDR dynamic radiance field via alternatively exposed monocular videos.

Alternating-exposure Monocular Video → HDR Scene

Given an alternating-exposure monocular video captured with exposure bracketing, HDR-NSFF reconstructs a dynamic HDR radiance field enabling temporally consistent HDR slow-motion rendering.

Bear

Input

Alternating-exposure video (DSLR Bracketing)

Output

HDR slow-motion rendering

Leaf

Input

Alternating-exposure video (DSLR Bracketing)

Output

HDR slow-motion rendering

Robin

Input

Alternating-exposure video (DSLR Bracketing)

Output

HDR slow-motion rendering

Abstract

Radiance of real-world scenes typically spans a much wider dynamic range than what standard cameras can capture. While conventional HDR methods merge alternating-exposure frames, these approaches are inherently constrained to 2D pixel-level alignment, often leading to ghosting artifacts and temporal inconsistency in dynamic scenes. To address these limitations, we present HDR-NSFF, a paradigm shift from 2D-based merging to 4D spatio-temporal modeling. Our framework reconstructs dynamic HDR radiance fields from alternating-exposure monocular videos by representing the scene as a continuous function of space and time, and is compatible with both neural radiance field and 4D Gaussian Splatting (4DGS) based dynamic representations. This unified end-to-end pipeline explicitly models HDR radiance, 3D scene flow, geometry, and tone-mapping, ensuring physical plausibility and global coherence. We further enhance robustness by (i) extending semantic-based optical flow with DINO features to achieve exposure-invariant motion estimation, and (ii) incorporating a generative prior as a regularizer to compensate for limited observation in monocular captures and saturation-induced information loss. To evaluate HDR space-time view synthesis, we present the first real-world HDR-GoPro dataset specifically designed for dynamic HDR scenes. Experiments demonstrate that HDR-NSFF recovers fine radiance details and coherent dynamics even under challenging exposure variations, thereby achieving state-of-the-art performance in novel space-time view synthesis.

Why 4D instead of 2D?

Conventional HDR video methods align and fuse alternating-exposure frames using 2D optical flow, causing color drift and geometric flickering under large or complex motions. HDR-NSFF reconstructs a unified 4D spatio-temporal HDR radiance field, ensuring global consistency across the entire video.

HDR Video (2D) Ours (4D)
Motion modeling Pixel-level 3D scene flow
Geometry None Explicit depth
Temporal scope 3–7 frames Entire video
Output One HDR frame HDR novel view & time synthesis

HDR Video Reconstruction

Given an alternating-exposure video, HDR video baselines (LAN-HDR, HDRFlow) fail to produce temporally consistent results due to 2D pixel-level alignment. Our model ensures temporal coherence and recovers valid information in saturated regions.

HDR video reconstruction comparison

Comparison of HDR video reconstruction on training views. Given alternating-exposure video, HDR video reconstruction baselines (LAN-HDR, HDRFlow) fail to produce consistent results, while our model ensures temporal coherence and recovers valid information in saturated regions.

Method

HDR-NSFF reconstructs dynamic HDR radiance fields by jointly optimizing HDR radiance, 3D scene flow, geometry, and tone-mapping from alternating-exposure monocular videos. Our framework introduces three core components:

Overall pipeline of HDR-NSFF

Overall pipeline. HDR-NSFF takes an alternating-exposure monocular video as input and estimates 3D scene flow for sampled points along each ray. Neighboring frames are warped to render the HDR radiance at the target frame, which is tone-mapped to LDR via a learnable white-balance and camera-response function (CRF) module. All components are jointly optimized end-to-end.

🎨 Tone-Mapping Module

A learnable piecewise CRF with per-channel white balance maps rendered HDR radiance to LDR observations. Smoothness regularization ensures physically plausible CRF curves even under extreme exposure variations.

🔍 Semantic Optical Flow

Standard optical flow degrades under alternating exposures. We leverage DINOv2 semantic features, which are invariant to photometric changes, to produce reliable exposure-robust motion estimates via DINO-Tracker.

🤖 Generative Prior

Monocular capture and saturated pixels cause information loss. A generative prior periodically synthesizes enhanced novel views as pseudo-labels, bootstrapping the 4D reconstruction into a pseudo-multi-view problem.

Semantic Optical Flow

Optical flow comparison

Standard optical flow (RAFT) fails under alternating exposures — even with gamma correction or fine-tuning on synthetic data. Our semantic-based approach using DINOv2 features achieves accurate motion estimation regardless of exposure variation.

Generative Prior as a Regularizer

Generative prior pipeline

Generative prior pipeline. Unseen novel views are first rendered, then refined via the generative prior to restore details in regions with broken correspondences. These enhanced views serve as pseudo-labels for progressive optimization, mitigating saturation and limited-view issues.

Results

HDR Rendering Results

HDR-NSFF reconstructs fine radiance details across a range of scenes with complex dynamics.

Big Jump

Jumping Jack

Pointing Walk

Side Walk

Tube Toss

Comparison with HDR-HexPlane

HDR-HexPlane lacks explicit motion modeling, limiting its ability to represent complex dynamics. Our method produces more accurate radiance, geometry, and motion representations across all scenes.

Big Jump

HDR-HexPlane

Ours

Jumping Jack

HDR-HexPlane

Ours

Pointing Walk

HDR-HexPlane

Ours

HDR-GoPro Dataset

We introduce the first real-world HDR benchmark for dynamic scene view synthesis. Nine synchronized GoPro Hero 13 Black cameras are arranged in a nearly parallel configuration, divided into three exposure groups (low, mid, high). An alternating-exposure monocular video is constructed from one camera per timestep; the remaining eight views serve as held-out evaluation references.

  • 9 synchronized cameras with explicit multi-exposure variations
  • 3 exposure levels per scene (low, mid, high)
  • 12 diverse scenes — challenging indoor/outdoor motions (jumping, tumbling, walking, etc.)
  • Supports both novel view and novel time synthesis evaluation

Sample sequences from our HDR-GoPro dataset at three exposure levels (low, mid, high), captured simultaneously by synchronized cameras.

Quantitative Comparisons

Novel View Synthesis — HDR-GoPro Dataset

Method Full Scene Dynamic Only
PSNR↑SSIM↑LPIPS↓ PSNR↑SSIM↑LPIPS↓
NSFF18.020.67920.206117.590.54730.2529
4DGS20.940.79050.154117.830.55240.2230
MotionGS14.610.39760.361712.330.23030.4696
NeRF-WT29.700.93330.059819.250.63550.1770
HDR-HexPlane20.700.66940.191720.550.66290.1716
Ours (w/o DT)29.930.93640.062124.930.80680.1048
Ours (w/o GP)32.660.94470.055725.650.82050.1012
Ours32.630.94440.0554 25.500.92080.0972

Novel View & Time Synthesis — Synthetic Data

Method Full Scene Dynamic Only
PSNR↑SSIM↑LPIPS↓ PSNR↑SSIM↑LPIPS↓
NSFF15.980.64570.138816.040.56970.1527
NeRF-WT31.100.93660.034221.500.74900.0895
HDR-HexPlane29.950.90550.052723.870.79990.1071
Ours35.070.94650.0483 27.190.88360.0576

BibTeX

@inproceedings{dong-yeon2026hdr-nsff,
      title = {HDR-NSFF: High Dynamic Range Neural Scene Flow Fields},
      author = {Dong-Yeon, Shin and Jun-Seong, Kim and Byung-Ki, Kwon and Oh, Tae-Hyun},
      booktitle = {International Conference on Learning Representations (ICLR)},
      year = {2026}
}