{"id":1299,"date":"2025-05-26T15:10:54","date_gmt":"2025-05-26T13:10:54","guid":{"rendered":"https:\/\/cammonte.com\/?page_id=1299"},"modified":"2025-12-08T14:53:21","modified_gmt":"2025-12-08T13:53:21","slug":"original-nerf-paper-breakdown","status":"publish","type":"page","link":"https:\/\/cammonte.com\/index.php\/original-nerf-paper-breakdown\/","title":{"rendered":"NeRF Basics"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">TL;DR<\/h1>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">NeRFs represent a 3D scene as a fully-connected deep network<\/mark><\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"630\" height=\"120\" src=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/Untitled-Diagram.drawio1.png\" alt=\"\" class=\"wp-image-1687\" style=\"width:420px;height:auto\" srcset=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/Untitled-Diagram.drawio1.png 630w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/Untitled-Diagram.drawio1-300x57.png 300w\" sizes=\"auto, (max-width: 630px) 100vw, 630px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Map any 3D location [math](x, y, z)[\/math] and viewing direction [math](\\theta, \\phi)[\/math] to a volume density value [math]\\sigma[\/math] and a colour (emitted radiance) [math]c=(r, g, b)[\/math] at that location that can then be used to render a novel view with classical volume rendering techniques<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How it works<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Sample<\/h3>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-7387b849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"wp-block-paragraph\">Trace a camera ray [math]r(t)[\/math] through the currently shaded pixel into the scene and sample [math]s_0, s_1, &#8230;, s_N[\/math] along it, [math]s_i=(x_i, y_i, z_i)[\/math]<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"470\" src=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/image-16-1024x470.png\" alt=\"\" class=\"wp-image-1670\" style=\"width:430px;height:auto\" srcset=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/image-16-1024x470.png 1024w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/image-16-300x138.png 300w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/image-16-768x353.png 768w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/image-16.png 1280w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Infer<\/h3>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-7387b849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"wp-block-paragraph\">For each sample, have the trained network infer its corresponding density [math]\\sigma_i[\/math] and colour [math]c_i[\/math]<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"120\" src=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/Untitled-Diagram.drawio.png\" alt=\"\" class=\"wp-image-1671\" style=\"width:414px;height:auto\" srcset=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/Untitled-Diagram.drawio.png 640w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/Untitled-Diagram.drawio-300x56.png 300w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading has-text-align-left\">Accumulate<\/h3>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-7387b849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"wp-block-paragraph\">Use classical volume rendering to accumulate colours and densities into the colour of the shaded pixel<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center wp-block-paragraph\">[math] C(r) = \\int_{t_n}^{t_f} [\/math] <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">[math] T(t) [\/math]<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">[math] \\sigma(r(t)) [\/math]<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-green-cyan-color\">[math] c(r(t), d) [\/math]<\/mark> [math] dt [\/math]<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-7387b849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">Accumulated transmittance from [math]t_n[\/math] to [math]t[\/math]<\/mark><\/strong><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">Volume density at [math]r(t)[\/math]<\/mark><\/strong><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-green-cyan-color\">Colour at [math]r(t)[\/math] when looking in the direction of [math]d[\/math]<\/mark><\/strong><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Backpropagate<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">TODO<\/mark><\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why it works<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Volume rendering is naturally differentiable (it&#8217;s literally the result of an integral) <\/strong>\u2192 use gradient descent to train the model by minimising error between observed images of the scene and corresponding rendered views<\/li>\n\n\n\n<li><strong>Positional encoding<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Issue<\/strong>: basic implementation of the above does not capture high frequency details (lots of sharp changes over a small area of the image) because the input is only 5D<\/li>\n\n\n\n<li><strong>Solution<\/strong>: project the 5D input into a higher dimensional space through sin and cos functions to represent both coarse and fine details<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Hierarchical volume sampling<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Issue<\/strong>: basic implementation of the above requires a lot of samples per camera rays to accurately capture a scene since they are sampled at random<\/li>\n\n\n\n<li><strong>Solution<\/strong>: use two networks: a coarse one that takes as input samples taken coarsely at random along the ray, use the densities it outputs to drive a finer sampling that will serve as input to a fine network which will give the final output densities and colours used for rendering <\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Why it&#8217;s cool<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Can <strong>model complex geometry and non-Lambertian surfaces <\/strong>(colour of the surface changes depending on the viewing direction)<\/li>\n\n\n\n<li>Gives an extremely <strong>compact representation of a 3D scene<\/strong>: NeRF optimised weights need less memory than the input JPEG images it trained on<\/li>\n\n\n\n<li>Gives a <strong>continuous representation of a 3D scene<\/strong> (prior related work is discrete)<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"3-related-work\">Related Work<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"4-neural-3d-shape-representations\">Implicit 3D shape representations<\/h2>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Represent continuous 3D shapes implicitly through functions mapping any spatial points to some meaningful value<\/mark><\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"5-ground-truth-geometry-based-methods\">Using ground truth geometry<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">How it works<\/h4>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Optimise a network to map any spatial point [math]xyz[\/math] to:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Signed distance functions<\/strong>: how far that point is from the closest surface of the shape\n<ul class=\"wp-block-list\">\n<li>Minimise MSE loss between predicted and ground truth values at sampled 3D coordinates (regression)<\/li>\n\n\n\n<li>SDF are continuous and differentiable so we can optimise on them<\/li>\n\n\n\n<li>Surface can be extracted via <strong>marching cubes<\/strong> (SDF = 0 for points on the surface)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Occupancy fields<\/strong>: maps [math]xyz[\/math] to [math]\\sigma \\in [0, 1][\/math] indicating occupancy probability of the point (whether it&#8217;s inside the shape (1) or outside (0))\n<ul class=\"wp-block-list\">\n<li>Minimise a binary cross-entropy loss between predicted occupancy and ground truth label (binary classification, easier than regressing SDF: easier to define labels, more stable gradients, faster convergence)<\/li>\n\n\n\n<li>Surface can be extracted via <strong>marching cubes<\/strong> (occupancy probability is 0.5 for points on the surface)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-ub-content-toggle wp-block-ub-content-toggle-block\" id=\"ub-content-toggle-block-1a677715-54cd-48a4-8d30-cb478e59e465\" data-mobilecollapse=\"false\" data-desktopcollapse=\"true\" data-preventcollapse=\"false\" data-showonlyone=\"false\">\n<div class=\"wp-block-ub-content-toggle-accordion\" style=\"border-color: #f1f1f1;\" id=\"ub-content-toggle-panel-block-\">\n\t\t\t<div class=\"wp-block-ub-content-toggle-accordion-title-wrap\" style=\"background-color: #f1f1f1;\" aria-controls=\"ub-content-toggle-panel-0-1a677715-54cd-48a4-8d30-cb478e59e465\" tabindex=\"0\">\n\t\t\t<p class=\"wp-block-ub-content-toggle-accordion-title ub-content-toggle-title-1a677715-54cd-48a4-8d30-cb478e59e465\" style=\"color: #000000; \"><strong>Marching cubes algorithm<\/strong><\/p>\n\t\t\t<div class=\"wp-block-ub-content-toggle-accordion-toggle-wrap right\" style=\"color: #000000;\"><span class=\"wp-block-ub-content-toggle-accordion-state-indicator wp-block-ub-chevron-down\"><\/span><\/div>\n\t\t<\/div>\n\t\t\t<div role=\"region\" aria-expanded=\"false\" class=\"wp-block-ub-content-toggle-accordion-content-wrap ub-hide\" id=\"ub-content-toggle-panel-0-1a677715-54cd-48a4-8d30-cb478e59e465\">\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Convert an implicit surface into a polygonal mesh<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Sample the 3D space at regular intervals<\/li>\n\n\n\n<li>Evaluate the function representing the implicit surface (SDF, occupancy field, &#8230;) at each grid point<\/li>\n\n\n\n<li>Identify grid cells where the function value crosses the threshold value (0 for SDF, 0.5 for occupancy field) across corners (means the surface passes through that cell)<\/li>\n\n\n\n<li>Approximate the surface within these cells according to a precomputed lookup table\n<ul class=\"wp-block-list\">\n<li>For each cube in the grid, you have 8^2=256 possible configurations (SDF\/occupancy for each of the 8 corners is over or below the threshold) and for each configuration you use a precomputed lookup table to know how to draw triangles within that cell to approximate the surface<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n<\/div>\n\t\t<\/div>\n<\/div>\n\n\n<h4 class=\"wp-block-heading\">Limitations<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Requires ground truth geometry to optimise on<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"6-differentiable-rendering-functions-based-methods\">Leveraging differentiable rendering functions<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">How it works<\/h4>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Formulate differentiable rendering functions to use the above methods with ground truth images rather than ground truth geometry<\/strong><\/p>\n\n\n<div class=\"wp-block-ub-content-toggle wp-block-ub-content-toggle-block\" id=\"ub-content-toggle-block-a6124ec8-2eaa-4375-8fd2-2f9bc90d4fb1\" data-mobilecollapse=\"false\" data-desktopcollapse=\"false\" data-preventcollapse=\"false\" data-showonlyone=\"false\">\n<div class=\"wp-block-ub-content-toggle-accordion\" style=\"border-color: #f1f1f1;\" id=\"ub-content-toggle-panel-block-\">\n\t\t\t<div class=\"wp-block-ub-content-toggle-accordion-title-wrap\" style=\"background-color: #f1f1f1;\" aria-controls=\"ub-content-toggle-panel-0-a6124ec8-2eaa-4375-8fd2-2f9bc90d4fb1\" tabindex=\"0\">\n\t\t\t<p class=\"wp-block-ub-content-toggle-accordion-title ub-content-toggle-title-a6124ec8-2eaa-4375-8fd2-2f9bc90d4fb1\" style=\"color: #000000; \"><strong>Numerical method and implicit differentiation example<\/strong><\/p>\n\t\t\t<div class=\"wp-block-ub-content-toggle-accordion-toggle-wrap right\" style=\"color: #000000;\"><span class=\"wp-block-ub-content-toggle-accordion-state-indicator wp-block-ub-chevron-down open\"><\/span><\/div>\n\t\t<\/div>\n\t\t\t<div role=\"region\" aria-expanded=\"true\" class=\"wp-block-ub-content-toggle-accordion-content-wrap\" id=\"ub-content-toggle-panel-0-a6124ec8-2eaa-4375-8fd2-2f9bc90d4fb1\">\n\n<ul class=\"wp-block-list\">\n<li>Cast a camera ray [math]r(t) = o + td[\/math] through the image plane into the 3D scene and numerically search for the point [math]t*[\/math] where the ray intersects the implicitly defined surface (ex: for application to the occupancy fields [math]f(r(t*))=0.5[\/math])<\/li>\n\n\n\n<li>Once you find [math]t*[\/math], you ask the network to predict the colour at that point, compare to the ground truth colour of the image, and then you want to backpropagate through this loss [math]L[\/math] to update the neural network<\/li>\n\n\n\n<li>The iterative numerical method used to find t* is not differentiable, so use <strong>implicit differentiation<\/strong> on the constraint [math]f(r(t*))=0.5[\/math] to compute [math]\\frac{dL}{d \\theta}[\/math]<\/li>\n<\/ul>\n\n<\/div>\n\t\t<\/div>\n<\/div>\n\n\n<h4 class=\"wp-block-heading\">Limitations<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Limited to simple shapes with low geometric complexity \u2192 over smoothed renderings<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">View synthesis and image-based rendering<\/h2>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Synthesise high-quality photorealistic novel views of a scene from a set of input RGB images of that scene<\/mark><\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Light field sample interpolation<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">How it works<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The light field represents all the light rays in a scene: colour and intensity as a function of pixel position and viewing direction<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-7387b849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center wp-block-paragraph\">[math]L(u, v, s, t)[\/math]<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<ul class=\"wp-block-list\">\n<li>[math](u, v)[\/math]: image coordinates<\/li>\n\n\n\n<li>[math](s, t)[\/math]: viewing directions<\/li>\n<\/ul>\n<\/div>\n<\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>To generate light field values for unseen [math](s, t)[\/math] pairs, we interpolate colour and intensity values for the unseen pair from nearby sampled views to generate a novel view<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Limitations<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Requires dense and regularly spaced input views<\/li>\n\n\n\n<li>Assumes scene continuity: locally smooth, surfaces and colours change gradually across views \u2192 miss high frequency details and sharp edges<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scene Representation Networks<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>How it works <\/strong><\/h4>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For each pixel in the novel view to be rendered:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Trained <strong>depth MLP<\/strong> predicts depth at which the ray hits a surface: takes as input ray origin and direction, camera parameters and <strong>scene code<\/strong> (latent vector representing the current scene) and outputs a predicted t (ray paramter) where closest intersection occurs\n<ul class=\"wp-block-list\">\n<li>Can train an input images to scene code encoder -&gt; <strong>SRN is generalisable across different scenes<\/strong><\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Trained <strong>scene MLP<\/strong> predicts RGB colour (and additional properties like occupancy, visibility, &#8230;) from intersection point in space and scene code<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Limitations<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Learned ray marching is less stable than volume rendering used by NeRF<\/li>\n\n\n\n<li>View independent: does not consider viewing direction when predicting colour -&gt; can&#8217;t predict specularities or reflections<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Sampled volume representations<\/h3>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Learn volumetric scene parameters at sampled points in the scene<\/strong><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Colouring voxel grids<\/h4>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Use observed images to directly colour voxel grids<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Define a regular grid of 3D voxels inside a 3D bounding volume that encloses the scene<\/li>\n\n\n\n<li>For each voxel, use the known camera intrinsics and extrinsics to project it into each image, define the colour of this voxel as the average of the colours of all corresponding pixels over the input images<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Limitations<\/strong>: assumes colour does not depend on viewing direction, need camera parameters<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Sampled volume representations<\/h4>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Train scene-independent deep network that predicts a sampled representation from a set of input images and use alpha compositing or learned compositing to render novel views<\/strong><\/p>\n\n\n<div class=\"wp-block-ub-content-toggle wp-block-ub-content-toggle-block\" id=\"ub-content-toggle-block-8d37686a-e432-4a84-9db6-fa7c98b72ff7\" data-mobilecollapse=\"false\" data-desktopcollapse=\"false\" data-preventcollapse=\"false\" data-showonlyone=\"false\">\n<div class=\"wp-block-ub-content-toggle-accordion\" style=\"border-color: #f1f1f1;\" id=\"ub-content-toggle-panel-block-\">\n\t\t\t<div class=\"wp-block-ub-content-toggle-accordion-title-wrap\" style=\"background-color: #f1f1f1;\" aria-controls=\"ub-content-toggle-panel-0-8d37686a-e432-4a84-9db6-fa7c98b72ff7\" tabindex=\"0\">\n\t\t\t<p class=\"wp-block-ub-content-toggle-accordion-title ub-content-toggle-title-8d37686a-e432-4a84-9db6-fa7c98b72ff7\" style=\"color: #000000; \"><strong>Alpha compositing (classic volume rendering)<\/strong><\/p>\n\t\t\t<div class=\"wp-block-ub-content-toggle-accordion-toggle-wrap right\" style=\"color: #000000;\"><span class=\"wp-block-ub-content-toggle-accordion-state-indicator wp-block-ub-chevron-down open\"><\/span><\/div>\n\t\t<\/div>\n\t\t\t<div role=\"region\" aria-expanded=\"true\" class=\"wp-block-ub-content-toggle-accordion-content-wrap\" id=\"ub-content-toggle-panel-0-8d37686a-e432-4a84-9db6-fa7c98b72ff7\">\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Render volumetric representations (we have colour and density at any given point in space)<\/strong><\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Combine predicted colour and density samples through a fixed equation<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To render an image using predicted colour [math]c_i[\/math] and density [math]\\sigma_i[\/math] values at a given point and viewing direction, we do the following for each camera ray:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Sample points along the ray in the 3D space<\/li>\n\n\n\n<li>Predict colour and density at these points<\/li>\n\n\n\n<li>Blend them using the standard volume rendering equation:\n<ul class=\"wp-block-list\">\n<li>[math]C = \\sum_{i}T_i alpha_i c_i[\/math]\n<ul class=\"wp-block-list\">\n<li>Opacity [math]\\alpha_i = 1 &#8211; \\text{exp}(-\\sigma_i \\delta_i)[\/math]<\/li>\n\n\n\n<li>Transmittance [math]\\T_i=\\text{exp}(- \\sum_{j=1}^{i-1} \\sigma_j \\delta_i)[\/math]<\/li>\n\n\n\n<li>Distance between adjacent samples [math]\\delta_i[\/math]<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n<\/div>\n\t\t<\/div>\n<\/div>\n\n<div class=\"wp-block-ub-content-toggle wp-block-ub-content-toggle-block\" id=\"ub-content-toggle-block-6fc2c7fc-8e1a-448d-8b19-3b7f64d4ec16\" data-mobilecollapse=\"false\" data-desktopcollapse=\"false\" data-preventcollapse=\"false\" data-showonlyone=\"false\">\n<div class=\"wp-block-ub-content-toggle-accordion\" style=\"border-color: #f1f1f1;\" id=\"ub-content-toggle-panel-block-\">\n\t\t\t<div class=\"wp-block-ub-content-toggle-accordion-title-wrap\" style=\"background-color: #f1f1f1;\" aria-controls=\"ub-content-toggle-panel-0-6fc2c7fc-8e1a-448d-8b19-3b7f64d4ec16\" tabindex=\"0\">\n\t\t\t<p class=\"wp-block-ub-content-toggle-accordion-title ub-content-toggle-title-6fc2c7fc-8e1a-448d-8b19-3b7f64d4ec16\" style=\"color: #000000; \"><strong>Learned compositing (neural volume rendering)<\/strong><\/p>\n\t\t\t<div class=\"wp-block-ub-content-toggle-accordion-toggle-wrap right\" style=\"color: #000000;\"><span class=\"wp-block-ub-content-toggle-accordion-state-indicator wp-block-ub-chevron-down open\"><\/span><\/div>\n\t\t<\/div>\n\t\t\t<div role=\"region\" aria-expanded=\"true\" class=\"wp-block-ub-content-toggle-accordion-content-wrap\" id=\"ub-content-toggle-panel-0-6fc2c7fc-8e1a-448d-8b19-3b7f64d4ec16\">\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Render volumetric or sampled representations (we have value at any given point in space and viewing direction)<\/strong><\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Train a network to combine them<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>More expressive than alpha compositing:\n<ul class=\"wp-block-list\">\n<li>Can model non-Lambertian effects (reflections, specular highlights)<\/li>\n\n\n\n<li>Can learn to handle occlusions, depth ambiguity<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">Look more into it<\/mark><\/strong><\/li>\n<\/ul>\n\n<\/div>\n\t\t<\/div>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">Relevant example is &#8220;Neural Volumes: Learning Dynamic Renderable Volumes from Images&#8221; (SIGGRAPH 2019)<\/mark><\/strong><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Limitations<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Can&#8217;t scale to higher resolution imagery because of poor time and space complexity due to discrete sampling\n<ul class=\"wp-block-list\">\n<li>NeRF encodes a <em>continuous<\/em> volume -&gt; requires way less storage than sampled volumetric representations<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\">Architecture<\/h1>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Predict volume density from coordinates only and emitted colour from full input to ensure multiview consistency<\/mark><\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"359\" src=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/image-9-1024x359.png\" alt=\"\" class=\"wp-image-1449\" style=\"width:819px;height:auto\" srcset=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/image-9-1024x359.png 1024w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/image-9-300x105.png 300w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/image-9-768x269.png 768w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/image-9.png 1332w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">What does this represent? Do both coarse and fine nets have this architecture?<\/mark><\/strong><\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Volume Rendering with Radiance Fields<\/h1>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">Use classical volume rendering principles to render the colour of any ray passing through the scene using predicted local density and colour<\/mark><\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To render a novel view, we need to shade every pixel of the image: trace a ray through each pixel and estimate the colour of this ray.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Volume rendering equation<\/h2>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Volume rendering equation gives the expected colour [math]C(r)[\/math] of a camera ray [math]r(t)=o+td[\/math] with near and far bounds [math]t_n[\/math] and [math]t_f[\/math]<\/strong><\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\">[math] C(r) = \\int_{t_n}^{t_f} [\/math] <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">[math] T(t) [\/math]<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">[math] \\sigma(r(t)) [\/math]<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-green-cyan-color\">[math] c(r(t), d) [\/math]<\/mark> [math] dt [\/math]<\/p>\n\n\n\n<figure class=\"wp-block-table aligncenter\"><table class=\"has-fixed-layout\" style=\"border-style:none;border-width:0px\"><tbody><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">[math] T(t) = \\text{exp} ( &#8211; \\int_{t_n}^{t} \\sigma(r(s)) ds ) [\/math]<\/mark><\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">[math] \\sigma(r(t)) [\/math]<\/mark><\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-green-cyan-color\">[math] c(r(t), d) [\/math]<\/mark><\/strong><\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">Accumulated transmittance from [math]t_n[\/math] to [math]t[\/math]<\/mark><\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">Volume density at [math]r(t)[\/math]<\/mark><\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-green-cyan-color\">Colour at [math]r(t)[\/math] when looking in the direction of [math]d[\/math]<\/mark><\/strong><\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">&#8220;How much stuff have we encounted along the ray up until the current point?&#8221;<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">\u2192 is a function of previous densities<\/mark><\/td><td class=\"has-text-align-center\" data-align=\"center\"> <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">&#8220;How much stuff is there at the current point?&#8221;<\/mark><\/td><td class=\"has-text-align-center\" data-align=\"center\"><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-green-cyan-color\">&#8220;What colour is the stuff at the current point?&#8221;<\/mark><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The inside of the integral answers the question &#8220;How much of <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-green-cyan-color\">this [math] c(r(t), d) [\/math] colour<\/mark> should I see at point [math]r(t)[\/math], considering <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">how much stuff is in front of that point<\/mark> and <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">how much stuff of that colour actually is at that point<\/mark>?&#8221;<\/li>\n\n\n\n<li>The whole integral answers the question: &#8220;How much of each colour along the ray should I see?&#8221;<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Quadrature estimate<\/h2>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Estimate the result of the volume rendering equation as a sum of samples<\/strong><\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\">[math] \\hat{C}(r) = \\sum_{i=1}^{N} [\/math] <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">[math]T_i[\/math]<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\"> [math] (1 &#8211; \\text{exp}(- \\sigma_i \\delta_i)) [\/math]<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-green-cyan-color\">[math]c_i [\/math]<\/mark><\/p>\n\n\n\n<figure class=\"wp-block-table aligncenter\"><table class=\"has-fixed-layout\" style=\"border-style:none;border-width:0px\"><tbody><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">[math] T_i = \\text{exp}(- \\sum_{j=1}^{i-1} \\sigma_j \\delta_i) [\/math]<\/mark><\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">[math] <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">(1 &#8211; \\text{exp}(- \\sigma_i \\delta_i))<\/mark> [\/math]<\/mark><\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-green-cyan-color\">[math] c_i [\/math]<\/mark><\/strong><\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">Accumulated transmittance for samples 1 to [math]i[\/math]<\/mark><\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">Volume density at [math]r(t)[\/math]<\/mark><\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-green-cyan-color\">Colour at [math]r(t_i)[\/math] when looking in the direction of [math]d[\/math]<\/mark><\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">[math]\\delta_i = t_{i+1} &#8211; t_i[\/math] is the distance between two adjacent samples<\/p>\n\n\n\n<p class=\"has-text-align-left wp-block-paragraph\">[math] \\hat{C}(r)[\/math] is trivially differentiable and reduces to traditional alpha compositing with alpha values [math] \\alpha_i = 1 &#8211; \\text{exp}(- \\sigma_i \\delta_i) [\/math]<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Optimising a Neural Radiance Field<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Positional encoding<\/h2>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>Allow the input to represent both fine and coarse grained details<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Initial input coordinates (position + viewing direction) does not allow for representation of high-frequency variation in colour and geometry (<strong>deep networks are biased towards learning lower frequency functions<\/strong>) -&gt; map inputs to higher dim space before passing them to network to have better fitting of data that contains high frequency variation (rapid changes over space, data fluctuates a lot and quickly, ex: images with lots of sharp edges or fine details)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Reformulate network function [math]F_{\\Theta}[\/math] as composition:<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-7387b849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center wp-block-paragraph\">[math]F_{\\Theta} = F&#8217;_{\\Theta} \\circ \\gamma[\/math]<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<ul class=\"wp-block-list\">\n<li>[math]F&#8217;_{\\Theta}[\/math]: regular MLP, learned<\/li>\n\n\n\n<li>[math]\\gamma: \\mathbb{R} \\rightarrow \\mathbb{R}^{2L} [\/math], not learned<\/li>\n<\/ul>\n<\/div>\n<\/div>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\">[math]\\gamma(p) = (\\sin(2^0\\pi p), \\cos(2^0\\pi p), &#8230;, \\sin(2^{L-1}\\pi p), \\cos(2^{L-1}\\pi p))[\/math]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[math]\\gamma(\\cdot)[\/math] applied separately to three coordinates values x, y, z normalised to lie in [-1, 1] and three components of cartesian unit vector [math]\\vec{d}[\/math] corresponding to viewing direction (lies in [-1, 1] by construction).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Visual understanding<\/h3>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-7387b849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"614\" src=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_pi_x_teal_centered_axes_zero-1-1024x614.png\" alt=\"\" class=\"wp-image-1572\" style=\"width:417px;height:auto\" srcset=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_pi_x_teal_centered_axes_zero-1-1024x614.png 1024w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_pi_x_teal_centered_axes_zero-1-300x180.png 300w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_pi_x_teal_centered_axes_zero-1-768x461.png 768w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_pi_x_teal_centered_axes_zero-1-1536x922.png 1536w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_pi_x_teal_centered_axes_zero-1.png 2000w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>[math]\\sin{(\\pi x)}[\/math]<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Encodes coarse details<\/li>\n\n\n\n<li>Big change in [math]x[\/math] induces small change in [math]y[\/math]<\/li>\n<\/ul>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-7387b849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"614\" src=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_2pi_x_teal_centered_axes_zero-1-1024x614.png\" alt=\"\" class=\"wp-image-1573\" style=\"width:415px;height:auto\" srcset=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_2pi_x_teal_centered_axes_zero-1-1024x614.png 1024w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_2pi_x_teal_centered_axes_zero-1-300x180.png 300w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_2pi_x_teal_centered_axes_zero-1-768x461.png 768w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_2pi_x_teal_centered_axes_zero-1-1536x922.png 1536w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_2pi_x_teal_centered_axes_zero-1.png 2000w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>[math]\\sin{(2 \\pi x)}[\/math]<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Encodes slightly finer details<\/li>\n\n\n\n<li>Big change in [math]x[\/math] induces a bit more change in [math]y[\/math]<\/li>\n<\/ul>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-7387b849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>&#8230;<\/strong><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>&#8230;<\/strong><\/p>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-7387b849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"614\" src=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_32pi_x_teal_centered_axes_zero-1-1024x614.png\" alt=\"\" class=\"wp-image-1574\" style=\"width:432px;height:auto\" srcset=\"https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_32pi_x_teal_centered_axes_zero-1-1024x614.png 1024w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_32pi_x_teal_centered_axes_zero-1-300x180.png 300w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_32pi_x_teal_centered_axes_zero-1-768x461.png 768w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_32pi_x_teal_centered_axes_zero-1-1536x922.png 1536w, https:\/\/cammonte.com\/wp-content\/uploads\/2025\/06\/sin_32pi_x_teal_centered_axes_zero-1.png 2000w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center wp-block-paragraph\"><strong>[math]\\sin{(32 \\pi x)}[\/math]<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Encodes fine details<\/li>\n\n\n\n<li>Even a small change in [math]x[\/math] induces a big change in [math]y[\/math]<\/li>\n<\/ul>\n<\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Hierarchical volume sampling<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Inefficient to evaluate network at [math]N[\/math] query points along each camera ray: free space and occluded regions sampled repeatedly while they don&#8217;t contribute to rendered image -&gt; use a hierarchical representation: allocate samples proportionally to expected effect on final rendering (== &#8220;sample with a preference for areas where there&#8217;s actual stuff to see&#8221;)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Optimise two networks instead of one: &#8220;coarse&#8221; and &#8220;fine&#8221; one<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">First sample [math]N_c[\/math] locations through stratified sampling and evaluate coarse network at these locations according to previously mentioned equation:<\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\">[math] \\hat{C}(r) = \\sum_{i=1}^{N} T_i (1 &#8211; \\text{exp}(- \\sigma_i \\delta_i))c_i [\/math]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">With<\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\">[math]T_i=\\text{exp}(- \\sum_{j=1}^{i-1} \\sigma_j \\delta_i)[\/math]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Gives us colour from the coarse network: [math]\\hat{C}_c(r)[\/math]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Use the coarse samples to evaluate where we should sample with a finer grain.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Rewrite it as weighted sum of sampled colours [math]c_i[\/math] along the ray (simple rewriting of the above really):<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-7387b849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center wp-block-paragraph\">[math]\\hat{C}_c(r) = \\sum_{i=1}^{N_c} w_i c_i [\/math]<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"wp-block-paragraph\">[math]w_i = T_i(1-\\text{exp}(-\\sigma_i \\delta_i)[\/math]<\/p>\n<\/div>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Normalise the weights:<\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\">[math]\\hat{w}_i = \\frac{w_i}{\\sum_{j=1}^{N_c} w_j}[\/math]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">And we get a piecewise-constant PDF along the ray.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Sample second set of [math]N_f[\/math] locations from this distribution using inverse transform sampling, evaluate fine network at union of first and second set of samples and compute final rendered colour of the ray [math]\\hat{C}_f(r)[\/math] using always the same equation<\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\">[math] \\hat{C}(r) = \\sum_{i=1}^{N} T_i (1 &#8211; \\text{exp}(- \\sigma_i \\delta_i))c_i [\/math]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">But using all [math]N_c + N_f[\/math] samples -&gt; allocate more samples to region that have visible content<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Optimise both coarse and fine networks jointly<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The output view is the output of the fine network<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Fine sampling is basically educated sampling<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Stratified sampling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Partition [math][t_n, t_f][\/math] into [math]N[\/math] evenly-spaced bins, draw one sample uniformly at random from each bin<\/li>\n<\/ul>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\">[math]t_i \\sim U [ t_n + \\frac{i-1}{N}(t_f &#8211; t_n), t_n + \\frac{i}{N}(t_f &#8211; t_n) ][\/math]<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enables a continuous scene representation: MLP is being evaluated at continuous positions over the course of optimisation<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Objective<\/h2>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\">[math]L = \\sum_{r \\in R} [ || [\/math] <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">[math] \\hat{C}_c(\\vec{r}) [\/math]<\/mark> [math] &#8211;  [\/math] <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">[math]C(\\vec{r}) [\/math]<\/mark>[math] ||_2^2 + || [\/math]<mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-green-cyan-color\">[math] \\hat{C}_f(\\vec{r}) [\/math]<\/mark> [math] &#8211;  C(\\vec{r}) ||_2^2 ] [\/math]<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-regular\"><table class=\"has-fixed-layout\" style=\"border-style:none;border-width:0px\"><tbody><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>[math]R[\/math]<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">[math]\\hat{C}_c(\\vec{r})[\/math]<\/mark><\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">[math]C(\\vec{r})[\/math]<\/mark><\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-green-cyan-color\">[math]\\hat{C}_f(\\vec{r})[\/math]<\/mark><\/strong><\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Set of all ray shooted through the ground truth image pixels<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">Coarse network prediction<\/mark><\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-luminous-vivid-orange-color\">Ground truth colour<\/mark><\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-green-cyan-color\">Fine network prediction<\/mark><\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h1 class=\"wp-block-heading\">Sources<\/h1>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Mildenhall, B., Srinivasan, P.\u202fP., Tancik, M., Barron, J.\u202fT., Ramamoorthi, R., &amp; Ng, R. (2020). NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2003.08934\">arXiv:2003.08934<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/youtu.be\/CRlN-cYFxTk?si=jIi_XQFhQKaYER1B\">NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (ML Research Paper Explained)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/youtu.be\/s3PHOPv88P4?si=nDiTS8jk9Xbw22rW\">A Brief Introduction to Neural Radiance Fields | CESCG Academy 2023<\/a><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>TL;DR NeRFs represent a 3D scene as a fully-connected deep network Map any 3D location [math](x, y, z)[\/math] and viewing direction [math](\\theta, \\phi)[\/math] to a volume density value [math]\\sigma[\/math] and a colour (emitted radiance) [math]c=(r, g, b)[\/math] at that location that can then be used to render a novel view with classical volume rendering techniques [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"ub_ctt_via":"","site-container-style":"default","site-container-layout":"default","site-sidebar-layout":"default","disable-article-header":"default","disable-site-header":"default","disable-site-footer":"default","disable-content-area-spacing":"default","footnotes":""},"class_list":["post-1299","page","type-page","status-publish","hentry"],"featured_image_src":null,"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/cammonte.com\/index.php\/wp-json\/wp\/v2\/pages\/1299","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cammonte.com\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/cammonte.com\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/cammonte.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cammonte.com\/index.php\/wp-json\/wp\/v2\/comments?post=1299"}],"version-history":[{"count":233,"href":"https:\/\/cammonte.com\/index.php\/wp-json\/wp\/v2\/pages\/1299\/revisions"}],"predecessor-version":[{"id":1795,"href":"https:\/\/cammonte.com\/index.php\/wp-json\/wp\/v2\/pages\/1299\/revisions\/1795"}],"wp:attachment":[{"href":"https:\/\/cammonte.com\/index.php\/wp-json\/wp\/v2\/media?parent=1299"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}