Renée Miller – Big Data Curation

Date: February 9, 2017
Time: 11 AM
Place: Senate Chambers Ross N940
Focus Session: 12:30-2:30pm in LAS 3033

Graduate students and postdocs who wish to attend the focus session should send the IC@L Admin, Ms Cimoan Atkins (, an email with their name, supervisor, and any dietary concerns – (lunch will be provided).

Title: Big Data Curation

Renée Miller
University of Toronto

In this talk, I consider some of the challenges to scaling data curation systems including data integration and data cleaning systems. First, I discuss that while data integration and cleaning are very mature fields, rigorous empirical evaluations of systems are relatively scarce. I identify a major roadblock for empirical work – the lack of tools that aid in generating the inputs and gold standard outputs for integration or cleaning tasks in a controlled, effective, and repeatable manner. I give an overview of our efforts to develop such tools and highlight how our tools have been used for streamlining the empirical evaluation of a variety of systems. Second, I consider the problem of dataset search. Web search algorithms are designed for documents, not data. To search for structured data, the state-of-the-art is to use traditional schema and data (entity) matching algorithms, but these are either too expensive to use over big data or ineffective on schema-free web data. I present some new results that bring us closer to achieving fast, Internet-scale dataset search and discuss applications to data science.

Bio: Renée J. Miller is a Professor of Computer Science and the Bell Canada Chair of Information Systems at the University of Toronto. She is a fellow of the Royal Society of Canada and a fellow of the ACM. She received the US Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers. She received an NSF CAREER Award, the Premier’s Research Excellence Award, and an IBM Faculty Award. She and her co-authors received the ICDT Test-of-Time Award for their influential 2003 paper establishing the foundations of data exchange. She has served on the Board of Trustees of the VLDB Endowment and as President of the Endowment. Her research is funded by NSERC, NSF, IBM, SAP, and Bell Canada among others. She received her PhD in Computer Science from the University of Wisconsin, Madison and Bachelor’s degrees in Mathematics and in Cognitive Science from MIT.