Rendering images in a way that allows gradients to be computed with respect to scene parameters

Propagate gradients of image pixels to scene parameters

Use cases

Pair with optimisation algorithms to solve inverse rendering problems
- Reflectance/lighting estimation (Inverse Path Tracing for Joint Material and Lighting Estimation, Azinovic)
- 3D reconstruction from one/multiple input pictures (Large steps in inverse rendering of geometry, Nicolet)
Propagate gradients into a network to learn the scene’s neural representation: learn a map from position/viewpoint to colour
- Novel view synthesis (NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, Mildenhall)
- Scene relighting (Neural Rendering Framework for Free-Viewpoint Relighting, Chen)
- Scene editing (IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis, Ye)

Physics-based differentiable rendering

Differentiable rendering techniques corresponding to global illumination algorithms

Main challenge

In order to backpropagate gradients from pixel colour to scene parameters, we need to be able to differentiate the function that gave these pixel colours with respect to the different scene parameters. We want to be able to compute:

[math]\frac{d}{dx} \int_{a(x)}^{b(x)} f(t, x) dt[/math]

[math]x[/math]: scene parameter (vertex position, material parameter, …)
[math]t[/math]: light path (we integrate over the space of all possible light transport paths)
[math]f[/math]: light contribution of a path [math]t[/math] for the current scene configuration defined by [math]x[/math]
[math]a, b[/math]: boundaries of the set of all valid light paths

The complexity resides in the fact that [math]a[/math] and [math]b[/math] depend on [math]x[/math]: as you change the scene, the set of valid light paths also changes (some paths become invalid because blocked, some appear). So the Leibniz Integral Rule gives us:

[math]\frac{d}{dx} \int_{a(x)}^{b(x)} f(t, x) dt = f(b(x), x) \cdot b'(x) – f(a(x), x) \cdot a'(x) + \int_{a(x)}^{b(x)} \frac{\partial f}{\partial x} dt [/math]

Introducing an additional boundary term [math]f(b(x), x) \cdot b'(x) – f(a(x), x) \cdot a'(x)[/math]

This boundary terms correspond to scenarios where tiny changes in scene parameters cause discrete changes in what light paths exist, and these are driven by visibility function. Visibility is binary, and therefore not continuous -> we stumble upon a discontinuous phenomenon in a continuous optimisation problem, and we can’t apply automatic differentiation which assumes continuous behaviour.

Current Solutions

Boundary sampling methods

Explicitly compute the boundary term by importance sampling paths in the boundary sample space

Solution from “Path-space differentiable rendering”, Zhang et al.

Fundamental rendering equation can be expressed in path-space form (see Veach 1997):

[math]I = \int_{\Omega} f(x) d\mu(x) [/math]

[math]I[/math]: radiometric measurement (pixel intensity)
[math]\Omega[/math]: path space (set of all possible light paths)
[math]x=(x_0, …, x_N)[/math]: one such light path
[math]\mu(x)[/math]: area-product measure, representing sampling over surface interactions
[math]f(x)[/math]: path contribution function

Expresses the final pixel colour as an integral over all possible light paths -> foundation of bidirectional path tracing -> differentiate this path integral

Goal

Compute the derivative of the above wrt to an arbitrary scene parameter [math]\pi[/math]:

[math]\frac{\partial I}{\partial \pi} = \frac{\partial}{\partial \pi} \int_{\Omega(\pi)} f(x, \pi) d\mu(x, \pi) [/math]

Main idea

Split the derivative into two components

[math]\frac{\partial I}{\partial \pi} = \int_{\Omega(\pi)}f’ d\mu + \int_{\partial \Omega(pi)} \Delta f V_{\partial \Omega} dl[/math]

[math]f'[/math]: normal scene derivative of the path contribution
[math]\partial \Omega(\pi)[/math]: boundary of path space (paths near shadow edges)
[math]\Delta f[/math]: jump in path contribution across a discontinuity
[math]V_{\partial \Omega}[/math]: normal velocity of the boundary curve (how fast the visibility changes)
[math]dl[/math]: integration over the boundary paths

Interior integral (left): smooth changes within the path space
Boundary integral (right): handles discontinuities

-> You can now derive tailored Monte Carlo estimators for each part: they are still high-dimensional integrals that can’t be computed exactly, we can however sample light paths to we use Monte Carlo integration to estimate them:

[math]\int f(x) dx \approx \frac{1}{N} \sum_{i=1}^{N} \frac{f(x^{(i)})}{p(x^{(i)})}[/math]

Interior Monte Carlo estimator: classic approach, sample paths through path tracing. For each path [math]x[/math], compute [math]f'(x)[/math] which measures how much that path’s contribution changes with [math]\pi[/math]
Boundary estimator: more complex, requires special sampling (ex: tracing toward visibility edges) because random sampling will almost never hit them: deliberately generate paths that land on the boundary
- Visibility edge sampling: explicitly sample configurations where visibility flips from 1 to 0, trace rays that go towards silhouette edges
- Inverse path reparameterisation: reparameterise the path space so that boundary paths lie on a domain with non-zero measures
- Finite difference aided sampling: sample normal light paths, perturb the scene parameter ([math]\pm \epsilon[/math]) and see if the contribution changes abruptly to know if that path is near a boundary. If yes, use finite differences to estimate the jump.

Reparameterisation methods

Reparameterise the integral domain by tracing auxiliary rays such that the geometric discontinuities remain static when scene parameters change

Solution of “Reparameterising discontinuous integrands for differentiable rendering”, Loubet et al.

Look into Warped-Area Reparameterization of Differential Path Integrals, Xu et al. and “Reparameterising discontinuous integrands for differentiable rendering”, Loubet et al.

Neural Radiance Fields (NeRFs)

Synthesizing novel views of complex 3D scenes based on a sparse set of 2D images

Core Idea

Use a neural network to model a continuous volumetric scene function:

[math](x, y, z), (\theta, \phi) \longrightarrow (r, g, b), \sigma[/math]

[math](x, y, z)[/math]: 3D coordinates of a point in the scene
[math](\theta, \phi)[/math]: viewing direction, direction of the camera ray that hits the point
[math](r, g, b)[/math]: colour of the point as seen from the direction
[math]\sigma[/math]: volume density: how much light is absorbed at that point

Use the output for volumetric rendering:

Training

Input Data
- Set of posed RGB images of a scene
- Corresponding camera intrinsics (focal length, …) and extrinsics (position, orientation)
Ray sampling
- For each training iteration
  - Random batch of pixel selected in input image
  - For each pixel
    - Compute corresponding camera ray (originates from the camera and passes through that pixel into the scene)
    - Sample multiple 3D points along the ray between near and far bounds: they are the coordinates [math]x_i[/math] that are fed into the NeRF network
Neural Network Inference
- For each sampled point [math]x_i[/math]
  - Apply positional encoding to [math]x_i[/math] and the ray direction [math]d[/math]
    - Lift input into higher-frequency space using sinusoids
  - Encoded inputs are passed through the MLP and get outputs [math](c_i, \sigma_i)[/math]
Volume Rendering
- Use outputs [math](c_i, \sigma_i)[/math] to numerically integrate along each ray to compute the final color [math]C[/math] of the pixel using volume rendering equation
Loss function
- Compare predicted [math]C[/math] to ground truth pixel colour through a standard MSE
Optimisation
- Backpropagate the loss through the network

Architecture

Fully connected feedforward network (MLP) with positional encoding applied to inputs

Look more into it

Strengths/Limitations

Strengths

High quality photorealistic novel views, outperforms traditional 3D reconstruction
Continuous scene representation
Compact scene representation (just neural network weights)
No explicit 3D model or mesh needed
Can learn view dependent effects (reflection, specular highlights) since it conditions on viewing direction

Limitations

Slow training: hours to days, evaluates for 100ks 3D points for each iteration
Slow inference: minutes, every pixel requires querying and integrating along a ray
Static scenes only: struggles with motion, deforming objects, dynamic lighting
Needs accurate camera poses: precises camera extrinsics
Scene specific model: each model captures one scene

Variants and extensions

Instant-NGP

Goal: make NeRF trainable and renderable in real-time
Main idea
- Replace MLP with a multi-resolution hash grid for scene encoding
- Use tiny neural networks and CUDA optimised code
Improvement: trains in seconds to minutes instead of days

Generalisation across scenes: PixelNeRF / MVSNeRF / Generalizable NeRFs

Goal: train a single model that works on new scenes at test time
Main idea
- Use image features as conditioning inputs
- Learn to predict radiance fields without retraining
Improvement: can synthesise novel views of unseen scenes with just a few images

Large-scale scenes: Mega-NeRF / Block-NeRF / Urban-NeRF

Goal: NeRFs for huge real-world environments
Main idea
- Use scene partitioning
- Compress and blend local NeRFs across large areas
- Handle variability in lighting and weather

Dynamic scenes: D-NeRF / Nerfies / HyperNeRF / NSFF

Goal: model non-static scenes
Main idea
- Add time variable to network input
- Learn deformation field over time
- Use optical flow or scene flow constraints

Trends

NeRFs as graphics engines
Text to NeRF, using language models to generate 3D scenes
NeRF + SLAM for real-time tracking and mapping

3D Gaussian Splatting (3DGS)

TODO

uses a discrete set of 3D Gaussians as base primitives

Below this is draft, things to sort out and to look into more

Key Challenges

Non-differentiability of visibility: changes in geometry cause discrete changes in geometry (occlusion) -> non continuous function -> not differentiable
High computational cost: rendering with gradients is expensive
Ill-posed inverse problems: multiple combinations of scene parameters can produce the same image
Complexity of rendering equation: full differentiability of global illumination is difficult due to recursive nature of light transport
Gradient noise: some differentiable renderers suffer from that

Selected work

State of research

Differentiable renderers

Soft Rasterizer (2019): smooth approximations to handle visibility
Neural mesh renderer (2018): approximate rasterisation gradient
Mitsuba 2 (2020): physical renderer with automatic differentiation
redner (2019): Monte Carlo differentiable renderer
Kaolin/PyTorch 3D: toolkits with differentiable rasterizers

Neural approaches

NeRF and variants: implicit scene representations using volumetric rendering
Volumetric differentiable renderers: user differentiable ray marching over density and color fields
Inverse rendering networks: learn to infer geometry, lighting and materials from single/multiple images

Trends

Combine NeRFs with differentiable rendering (NeRF++)
Increased focus on real-time differentiable rendering (rasterisation with approximated gradients)

To look into / Sources

A Brief Review on Differentiable Rendering: Recent Advances and Challenges, Ruicheng Gao and Yue Qi
Differentiable Rendering: A Survey, Hiroharu Kato, Deniz Beker, Mihai Morariu, Takahiro Ando, Toru Matsuoka, Wadim Kehl and Adrien Gaidon
A Survey on Physics-based Differentiable Rendering, Y. Zeng¹, G. Cai² and S. Zhao²