ExtraNet: Real-time Extrapolated Rendering for Low-latency Temporal Supersampling

Published on: ToG (SIGGRAPH ASIA 2021)

Kind: new solution.

Problem: temporal supersampling methods that directly producing more frames (increasing frame rate) are not practically available due to latency and performance.

Why it matters: To lower the computation cost and increase frame rate, both spatial & temporal supersampling are seriously needed. However, previous techniques that leveraging the temporal information are essentially still perform spatial upsampling.

Previous limitations:

Latency: previous attempts refering to frame interpolation generally have high latency.
Performance: it’s meaningless if generating time > full rendering time of a frame. So, a lot of previous work that producing excellent quality but running very slow has been ruled out.
Without G-buffers: a lot of previous methods do not have G-buffers to use (they do not specifically apply for rendered images).

Key idea:

Instead of doing interpolation, use a neural network extrapolates new frames to minimize the latency.
Use light-weight gated convolutions to enable fast inference.
Take advantages of the G-buffers (and accurate motion vectors).

Related Work:

Temporal reconstruction:

They both leverage temporally reprojected infomation from previous frames to generated frames.
However, unlike temporal reconstruction work which still takes a small amount of samples at frame being reconstructed, the authors’ method does not have any sample in the extrapolated frame.

Texture in-painting:

They both Fill the holes.
However, general texture in-painting techniques are not specifically designed for rendering, which do not take G-buffers into use.

Image warping:

They both recover contents at hole regions.
However, these methods always leave shadow and highlight movement untouched.

Video and rendering interpolation:

They both do frame rate upsampling.
However, these methods are either time-consuming (deep learning based) or producing artifacts. More importantly, interpolation always comes with high latency.

Motivation, challenges, and methods:

Known that “compared with interpolation, extrapolation has zero additional latency”, they decide to use extrapolation to perform temporal supersampling.

To perform extrapolation in current frame, using reprojected shading from previous frame and G-buffers generated from current frame, there are several key challenges, including disocclusion and dynamic changes in shading, discussed in “challenges when performing extrapolation for temporal supersampling”.

For address these issues, they proposed ExtraNet:

use an In-painting Network to reshade the disocclusion regions.
use a History Encoder to extract history information for better prediction of changes in shadows and reflections.
use light-weight gated CNN to get high inference speed.

2022 1

2022