Computer vision is a branch of computer science related to several disciplines including mathematics, cognitive science, computer graphics and machine learning. Its goal is to interpret images and video in a way similar to the human visual system and develop algorithms to this end.
The ERC grant, worth 1.5 million euros, will help Ivan and his colleagues develop their research and in particular go beyond simple object recognition to something that is more useful. For instance, in a street scene, the developed algorithm would not only identify cars and people as distinct objects, but also actually predict what these objects might do next. This would be done by analysing interactions between the objects and developing statistical models that describe these interactions.
Constructing statistical models
Ideally, irrelevant information, or “noise”, would be weaned out from a scene, so that it did not interfere with interpretations. “We would learn what is relevant by observing many people interacting with the same object in videos of events that go on for a relatively long period of time - such as house parties, or a house being cleaned - recorded with a static camera,” explains Laptev. Such data would be used to construct statistical models that describe how people typically interact with particular types of objects or scenes. One example could be a kitchen model that would help identify a person sitting on a stove as something unusual and potentially dangerous.
“Simply labelling images is not enough,” he says.
We would like computers to be able to interpret complex scenes in videos, like when people open doors, sit down, shake hands and all manner of other activities, in order to recognise their intentions and alert them if they were about to do something potentially dangerous, as in the model above. We would also like to be able to suggest useful actions they could take in a given scene.
Analysing real films and videos
Until recently, such analyses were performed under constrained conditions, with students in Ivan’s group acting out well-defined roles. The research has now reached the stage where a computer can successfully analyse real film and video footage.
So what are the potential applications of this research? “INA in France and the BBC in the UK are very interested in our work because it might help them index the vast amount of videos these organisations have in their archives,” said Ivan. “The same could be done for videos on YouTube, where content is increasing everyday.
Healthcare could also benefit – for example, monitoring elderly people to prevent accidents by predicting hazardous situations. Intelligent cameras in the home could also make our lives easier, by, for instance, recording where we left our keys last night so we would not have to waste time looking for them in the morning.
About Ivan Laptev
Ivan is currently at INRIA Paris-Rocquencourt, a unit of the French National Institute for Research in Computer Science and Control. He is working with the WILLOW research group, associated with the Informatics Department of Ecole Normale Supérieure, led by Jean Ponce. He did his PhD at the Royal Institute of Technology in Stockholm, Sweden, in the Computer Vision and Active Perception Laboratory (CVAP).