Latest News

February 27, 2018: Antonio Torralba – Learning from Sounds and Images

Date: February 27, 2018
Time: 11am
Place: Senate Chambers, Ross N940
Focus Session: LAS 3033, 12:30 – 2:30

Graduate students and postdocs who wish to attend the focus session should send the IC@L Admin, Ms Cimoan Atkins (, an email with their name, supervisor, and any dietary concerns – (lunch will be provided).

Title: Learning from Sounds and Images

Antonio Torralba, Professor of Electrical Engineering and Computer Science, Massachusetts Institute of Technology (MIT)

Computer vision is going through a revolution. One of the key reasons for the recent successes in computer vision is the access to massive annotated datasets that have become available in the last few years. Unfortunately, creating these datasets is expensive and labor intensive. On the other hand, humans do not require massive annotated datasets in order to learn to perceive the world. In fact, babies learn with very little supervision, and, even when supervision is present, it comes in the form of an unknown spoken language that also needs to be learned. How can kids make sense of the world? In this work, I will show that an agent that has access to multimodal data (like vision and audition) can use the correlation between images and sounds to discover objects in the world without supervision. I will show that ambient sounds can be used as a supervisory signal for learning to see and vice versa (the sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings). I will also show how we can use raw speech descriptions of images to jointly learn to segment words in speech and objects in images without any additional supervision.

Speaker Bio:

Antonio Torralba is a Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology (MIT) and the MIT director of the MIT-IBM Watson AI Lab. He received the degree in telecommunications engineering from Telecom BCN, Spain, in 1994 and the Ph.D. degree in signal, image, and speech processing from the Institut National Polytechnique de Grenoble, France, in 2000. From 2000 to 2005, he did postdoctoral training at the Brain and Cognitive Science Department and the Computer Science and Artificial Intelligence Laboratory, MIT, where he is now a professor. Prof. Torralba is an Associate Editor of the International Journal in Computer Vision, and has served as program chair for the Computer Vision and Pattern Recognition conference in 2015. He received the 2008 National Science Foundation (NSF) Career award, the best student paper award at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) in 2009, and the 2010 J. K. Aggarwal Prize from the International Association for Pattern Recognition (IAPR). In 2017, he received the Frank Quick Faculty Research Innovation Fellowship and the Louis D. Smullin (’39) Award for Teaching Excellence.