Neural Inverse Rendering of an Indoor Scene From a Single Image

Soumyadip Sengupta
Jinwei Gu
Kihwan Kim
Guilin Liu
David W. Jacobs
Jan Kautz

University of Maryland, College Park
University of Washington

Neural Inverse Rendering of an Indoor Scene From a Single Image. We propose a self-supervised approach for inverse rendering. We jointly decompose an indoor scene image into albedo, surface normal and environment map lighting (top). Our method outperforms state-of-the-art approaches (bottom) that solve for only one of the scene attributes, i.e. albedo (Li et. al.), normal (Zhang et. al.) and lighting (Gardner et. al.).

Inverse rendering aims to estimate physical attributes of a scene, e.g., reflectance, geometry, and lighting, from image(s). Inverse rendering has been studied primarily for single objects or with methods that solve for only one of the scene attributes. We propose the first learning based approach that jointly estimates albedo, normals, and lighting of an indoor scene from a single image. Our key contribution is the Residual Appearance Renderer (RAR), which can be trained to synthesize complex appearance effects (e.g., inter-reflection, cast shadows, near-field illumination, and realistic shading), which would be neglected otherwise. This enables us to perform self-supervised learning on real data using a reconstruction loss, based on re-synthesizing the input image from the estimated components. We finetune with real data after pretraining with synthetic data. Experimental results show that our approach outperforms state-of-the-art methods that estimate one or more scene attributes.


Soumyadip Sengupta, Jinwei Gu, Kihwan Kim, Guilin Liu, David W. Jacobs, Jan Kautz.

Neural Inverse Rendering of an Indoor Scene From a Single Image

In ICCV 2019.


Overview of our approach. Our Inverse Rendering Network (IRN) predicts albedo, normals and illumination map. We train on unlabeled real images using self-supervised reconstruction loss. Reconstruction loss consists of a closed-form Direct Renderer with no learnable parameters and the proposed Residual Appearance Renderer (RAR), which learns to predict complex appearance effects.


Surface Normal Estimation

Albedo Estimation

Comparison with Zhang et. al.

Comparison with Li et. al.

Lighting Estimation

More Results

Comparison with Gardner et. al.

Role of RAR in albedo estimation

For more qualitative and quantitative comparisons, please see our paper.

For downloading and visualizing more estimated results obtained by our algorithm: please visit the following link.
For any additional questions or clarifications, please feel free to contact me (Soumyadip) at soumya91 @


We thank Hao Zhou and Chao Liu for helpful discussions. This research is partly supported by the National Science Foundation under grant no. IIS-1526234.. This webpage template is taken from humans working on 3D who borrowed it from some colorful folks.