Facial action unit detection with convolutional neural networks
Citas bibliográficas
Enlace de Referencia
Autores
Director
Autor corporativo
Recolector de datos
Otros/Desconocido
Director audiovisual
Editor/Compilador
Fecha
Resumen
We propose a novel deep convolutional neural network architecture to study the problem of action unit detection. We leverage recent gains in large-scale object recognition by formulating the task of predicting the presence of a specific action unit in a still image as simple image-level binary classification. We first train a convolutional encoder on the problem of multi-view emotion recognition as a high-level representation of facial expressions. We show that our architecture generalizes across views, ethnicity, gender and age by merging and training jointly on three standard emotion recognition datasets: CK+, Bosphorus and RafD. Our system is the first fully multi-view emotion recognizer proposed in the literature. We then extend this shared learned representation with fully-connected layers trained to detect individual action units. Our approach is conceptually simpler and yet significantly more accurate than the best methods based on the dominant paradigm for the study of this problem, which relies on facial landmark detection as an intermediate task. We conduct experiments on the BP4D dataset, the largest and most challenging benchmark currently available for action unit detection, and report an absolute improvement of 16% over the previous state-of-the-art.