Computer graphics

Creating stunning real time 3D scenes: the breakthrough of 3D Gaussian Splatting

Date:
Changed on 22/10/2024
Combining cutting-edge methods in computer graphics and the use of machine learning techniques, the 3D Gaussian splatting method developed by the GraphDeco team and presented at the leading conference in the field, SIGGRAPH 2023, won a Best Paper Award. It enables real-time rendering of photo-realistic scenes trained on small image samples. This is a major advance on the state of the art hitherto dominated by Google's Neural Radiance Fields (NeRF) and Nvidia’s InstantNGP, with an innovative method that is both faster and more accurate. The online code for non-commercial use quickly created an enthusiastic response of graphics and gaming startups as well as larger companies, paving the way for licensed commercial collaborations.

From computer vision to computer graphics

The origins of this field of research lie in the work of Olivier Faugeras on computer vision over 25 years ago, who laid the foundations of the mathematical theory for 3D reconstruction from images, and the start-up Realviz. A new stage was reached 15 years ago with automation and automatic camera placement algorithms for 3D rendering, and Jean Ponce's research into creating approximate meshes by densifying points. This work continued with better solutions to generate textured meshes, including work by Jean-Daniel Boissonnat and the Acute3D start-up.  In 2019 comes the NeRF (Neural Radiance Field) revolution, which for the first time uses machine learning tools to represent a scene. The state of the art in quality is Mip-NeRF 360 (Google Research), while the fastest method is 2022 NVidia's Instant NGP, whose rendering speed remains limited to 15 frames per second.

The GraphDeco team

The GRAPHDECO research group (and its predecessor REVES) developed a sequence of research results in the last 15 years building on the original computer vision work, notably introducing methods for image-based rendering that allows the synthesis of novel 3D views by combining information from the input photographs with the approximate 3D geometry reconstruction.

In the context of George Drettakis' ERC Advanced Grant FUNGRAPH, machine learning methodologies were used to develop several significant new solutions to novel view synthesis. A key idea developed in the Ph.D. thesis of Georgios Kopanas was to introduce differentiable point-based rendering to represent Radiance Fields, with better quality and rendering speed than NeRFs. This work resulted in two publications in the Eurographics Symposium on Rendering  (2021) and at ACM SIGGRAPH Asia (2022), setting the foundations for the Gaussian splatting solution.

In 2023, the new 3D Gaussian splatting method presented by Georgios Kopanas, Bernhard Kerbl and George Drettakis of the GraphDeco research team, in collaboration with Thomas Leimkühler of the Max-Planck Institute matches the visual quality of of Google's method in 30 minutes, and that of NVIDIA in 7 minutes. Contrary to previous fast methods, as training continues, image quality continues to improve, and rendering is over 100 frames per second which generated a great deal of interest in the graphics community, and companies in the audiovisual and video game fields are lining up to test and develop their solutions based on this innovative technology.

3 winners

I-Laureats-Siggraph-2023-GraphDeco-Sophia

In addition to the main funding of ERC FUNGRAPH grant, this research was also supported by ADOBE, Université Côte d'Azur through the OPAL infrastructure and GENCI-IDRIS HPC computing resources.

The 3D Gaussian splatting method

The 3D Gaussian splatting method offers a new way of representing radiance fields that not only delivers cutting-edge results in terms of image quality, but also real-time rendering at over 100 frames per second, rapid optimization with a reasonable memory footprint, and easy integration into graphics engines. It enables real-time rendering of photorealistic scenes learned from small set of images.

Radiance field methods have recently revolutionized the synthesis of novel view synthesis of scenes captured using multiple photos or videos. However, achieving visual quality still requires neural networks that are expensive to train, while recent faster methods inevitably sacrifice speed for quality. For complete, unrestricted scenes (rather than isolated objects) and rendering at 1080p resolution, no current method can achieve real-time display rates.

Three key elements enable state-of-the-art visual quality to be achieved while maintaining competitive training times and, above all enabling high-quality real-time synthesis 100 frames per second) of new views at 1080p resolution.

As Georgios Kopanas explains, "Firstly, from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve the desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary empty-space calculations. Secondly, we perform interleaved optimization and density control of the 3D Gaussians, in particular by optimizing the anisotropic covariance (with one side very thin when the other is very small) to obtain an accurate representation of the scene. Thirdly, we develop a fast, visibility-aware rendering algorithm that supports anisotropy, speeding up learning and enabling real-time rendering. The visual quality of our method is equal to the state of the art and enables real-time rendering on multiple datasets usually used by competing methods."

The secrets of the method

  • Capture details more efficiently, while retaining the GPU's fast streamlining.
  • Densify the number of points with a new algorithm
  • Parallel specialized rendering to accelerate using the GPU

In concrete terms, several hundred thousand or even millions of points are required for a rendering of this quality.

Verbatim

Technically speaking, our method is not strictly speaking machine learning, but the method uses machine learning techniques to train and improve rendering quality

Auteur

George Drettakis

Poste

GraphDeco team manager

A cutting-edge tool with high potential

Using 200 photographs from a regular camera as input, and rendering at 105 frames per second - the quality of video game images - the result gives the illusion of walking through the video. What's more, when zoomed in, finer details such as the spokes of a bicycle wheel are clearly visible, with excellent rendering.

The high-quality source code has been made available for Linux and Windows via Github, along with executables and detailed instructions. These can be used and tested by anyone, even without much prior technical knowledge.

Potential fields of application are extremely varied, ranging from e-commerce, video games or special effects films (filming a location and then projecting this environment onto the floor and wall to create an ultra-realistic setting) to public works (remotely inspecting various constructions such as bridges or viaducts to check their safety in inaccessible places) or preparing for the dismantling of dangerous sites, or the reconstruction of sites destroyed by fire, all with the real-time projection of hyper-realistic 3D images.

User stories

Volinga is a Spanish company founded in 2023 whose mission is to provide the best set of 3D volumetric capture tools for professionals working in production, broadcast, live events and immersive film and TV experiences.

Business sector: Media & Entertainment

Use of 3D Gaussian Splatting technology: 3D Gaussian Splatting enables Volinga to offer M&E professionals cost-effective creation of photorealistic 3D environments in less than an hour, for preview (pre-production), ICVFX (production) or traditional VFX (post-production).

AniML is a startup created in 2022 by two 3D serial entrepreneurs Pierre Pontevia and Rémi Rousseau, based in France and Canada.

Business sector: E-commerce

Use of 3D Gaussian Splatting technology: AniML develops a Doly application for scanning products and presenting them in context in videos. To do this, it uses 3D Gaussian Splatting technology, which enables photorealistic rendering of objects captured by users.

IR-Entertainment Ltd specializes in digitizing humans. Their focus is on providing realistic digital models for use in various entertainment and research sectors.

Business sector: Digital entertainment, gaming, and research.

Use of 3D Gaussian Splatting technology: Advanced techniques such as 3D Gaussian Splatting have significantly enhanced IR-Entertainment Ltd’s capacity to process and render human scans in much higher detail, as fine as individual hair strands. These improvements are ideal for use in industries like gaming, film, and research.