SfSNet: Learning Shape, Reflectance and Illuminance of Faces ‘in the wild’

Soumyadip Sengupta
Angjoo Kanazawa
Carlos D. Castillo
David W. Jacobs

University of Maryland, College Park
University of California, Berkeley

SfSNet: Learning Shape, Reflectance and Illuminance of Faces ‘in the wild’. We present SfSNet that learns from a combination of labeled synthetic and unlabeled real data to produce an accurate decomposition of an image into surface normals, albedo and lighting. Relit images are shown to highlight the accuracy of the decomposition.

We present SfSNet, an end-to-end learning framework for producing an accurate decomposition of an unconstrained image of a human face into shape, reflectance and illuminance. Our network is designed to reflect a physical lambertian rendering model. SfSNet learns from a mixture of labeled synthetic and unlabeled real world images. This allows the network to capture low frequency variations from synthetic images and high frequency details from real images through photometric reconstruction loss. SfSNet consists of a new decomposition architecture with residual blocks that learns complete separation of albedo and normal. This is used along with the original image to predict lighting. SfSNet produces significantly better quantitative and qualitative results than state-of-the-art methods for inverse rendering and independent normal and illumination estimation.


Soumyadip Sengupta, Angjoo Kanazawa, Carlos D. Castillo, David W. Jacobs.

SfSNet : Learning Shape, Reflectance and Illuminance of Faces ‘in the wild’

In CVPR 2018 (Spotlight).



Network Architecture. Our SfSNet consists of a novel decomposition architecture that uses residual blocks to produce normal and albedo features. They are further utilized along with image features to estimate lighting, inspired by a physical rendering model. f combines normal and lighting to produce shading.



Comparison with Neural Face

Comparison with MoFA

SfSNet vs Neural Face on the data showcased by the authors. Note that the normals shown by SfSNet and Neural Face have reversed color codes due to different choices in the coordinate system.

SfSNet vs MoFA on the data provided by the authors of the paper.

Comparison with Pix2Vertex

Normals produced by SfSNet are significantly better than Pix2Vertex, especially for nonambient illumination and expression. ‘Relit’ images are generated by directional lighting and uniform albedo to highlight the quality of the reconstructed normals. Note that (a), (b) and (c) are the images showcased by the authors.

For more qualitative and quantitative comparisons, please see our paper.


We thank Hao Zhou and Rajeev Ranjan for helpful discussions, Ayush Tewari for providing visual results of MoFA, and Zhixin Shu for providing test images of Neural Face. This research is supported by the National Science Foundation under grant no. IIS-1526234. This webpage template is taken from humans working on 3D who borrowed it from some colorful folks.