Accession Number : ADA612444


Title :   Inferring the Why in Images


Corporate Author : MASSACHUSETTS INST OF TECH CAMBRIDGE


Personal Author(s) : Pirsiavash, Hamed ; Vondrick, Carl ; Torralba, Antonio


Full Text : http://www.dtic.mil/get-tr-doc/pdf?AD=ADA612444


Report Date : Jan 2014


Pagination or Media Count : 11


Abstract : Humans have the remarkable capability to infer the motivations of other people's actions, likely due to cognitive skills known in psychophysics as the theory of mind. In this paper, we strive to build a computational model that predicts the motivation behind the actions of people from images. To our knowledge, this challenging problem has not yet been extensively explored in computer vision. We present a novel learning based framework that uses high-level visual recognition to infer why people are performing an actions in images. However, the information in an image alone may not be sufficient to automatically solve this task. Since humans can rely on their own experiences to infer motivation, we propose to give computer vision systems access to some of these experiences by using recently developed natural language models to mine knowledge stored in massive amounts of text. While we are still far away from automatically inferring motivation, our results suggest that transferring knowledge from language into vision can help machines understand why a person might be performing an action in an image.


Descriptors :   *COMPUTER VISION , COGNITION , HUMANS , IMAGES , PSYCHOPHYSICS , VISUAL PERCEPTION


Subject Categories : Cybernetics


Distribution Statement : APPROVED FOR PUBLIC RELEASE