Goal Conditioned Reinforcement Learning for Photo Finishing Tuning

Jiarui Wu^1,2, Yujin Wang^1†, Lingen Li^1,2, Zhang Fan¹, Tianfan Xue^2,1

¹Shanghai AI Laboratory, ²The Chinese University of Hong Kong
^†indicates corresponding author

NeurIPS 2024

Abstract

Photo finishing tuning aims to automate the manual tuning process of the photo finishing pipeline, like Adobe Lightroom or Darktable. Previous works either use zeroth-order optimization, which is slow when the set of parameters increases, or rely on a differentiable proxy of the target finishing pipeline, which is hard to train. To overcome these challenges, we propose a novel goal-conditioned reinforcement learning framework for efficiently tuning parameters using a goal image as a condition. Unlike previous approaches, our tuning framework does not rely on any proxy and treats the photo finishing pipeline as a black box. Utilizing a trained reinforcement learning policy, it can efficiently find the desired set of parameters within just 10 queries, while optimization based approaches normally take 200 queries. Furthermore, our architecture utilizes a goal image to guide the iterative tuning of pipeline parameters, allowing for flexible conditioning on pixel-aligned target images, style images, or any other visually representable goals. We conduct detailed experiments on photo finishing tuning and photo stylization tuning tasks, demonstrating the advantages of our method.

Motivation

In this work, we propose an RL-based photo finishing tuning algorithm that efficiently tunes the parameters of a black-box photo finishing pipeline to match any tuning target. The RL-based solution (top row) takes only about 10 iterations to achieve a similar PSNR as the 500-iteration output of a zeroth-order algorithm (bottom row). Our method demonstrates fast convergence, high quality, and no need for a proxy.

Overview of our Goal Conditioned Reinforcement Learning Pipeline

We achieve efficient parameter searching using goal conditioned reinforcement learning. At each iteration, the policy network takes the tuning target and the currently tuned image as input, and finds a better set of parameters. By forming the problem into a Markov Decision Process and training the RL policy, we develop a smart searching algorithm that makes the finishing results closer to the target at each step. With a learned policy, this algorithm can more accurately predict the potential searching direction than zeroth-order searching. And our algorithm also do not rely on a differential proxy.