TimeLens-XL: Real-time Event-based Video Frame Interpolation with Large Motion

ECCV 2024

1The Chinese University of Hong Kong and 2Shanghai AI Laboratory
We present TimeLens-XL Net (TXLNet), a real-time video frame interpolation method for large motion with a hybrid neuromorphic camera. Taking two consecutive frames and the events as input, our methods estimate the complete dense optical flow fields within the period (shown in the second row, drawing motion trajectories from time 0 to 1), with which we can then flexibly sample the optical flow at any time step between input frames for video frame interpolation. As shown, our proposed methods can recover motion trajectories of large complex motion (nonlinear, non-rigid, and severely occluded), thus, outperforming state-of-the-art prior work with only 1/5 computational cost.

Abstract

Video Frame Interpolation (VFI) aims to predict intermediate frames between consecutive low frame rate inputs. To handle the real-world complex motion between frames, event cameras, which capture high-frequency brightness changes at micro-second temporal resolution, are used to aid interpolation, denoted as Event-VFI. One critical step of Event-VFI is optical flow estimation. Prior methods that adopt either a two-segment formulation or a parametric trajectory model cannot correctly recover large and complex motions between frames, which suffer from accumulated error in flow estimation. To solve this problem, we propose TimeLens-XL, a physically grounded lightweight network that decomposes large motion between two frames into a sequence of small motions for better accuracy. It estimates the entire motion trajectory recursively and samples the bi-directional flow for VFI. Benefiting from the accurate and robust flow prediction, intermediate frames can be efficiently synthesized with simple warping and blending. As a result, the network is extremely lightweight, with only 1/5~1/10 computational cost and model size of prior works, while also achieving state-of-the-art performance on several challenging benchmarks. To our knowledge, TimeLens-XL is the first real-time (27fps) Event-VFI algorithm at a resolution of 1280x720 using a single RTX 3090 GPU. Furthermore, we have collected a new RGB+Event dataset (HQ-EVFI) consisting of more than 100 challenging scenes with large complex motions and accurately synchronized high-quality RGB-EVS streams. HQ-EVFI addresses several limitations presented in prior datasets and can serve as a new benchmark. Both the code and dataset will be released upon publication.

Video

}