Introduction

Modern machine learning models are data hungry. Collecting and annotating data is hard and expensive. Moreover, there is a need for diverse training data in order to create fair and unbiased models. With advances in simulations and generative models, utilizing synthetic data is becoming a viable alternative to training models using real data. However, there are still many open problems to realize this goal, such as lack of standard tools, realism gap between synthetic and real data, and need for machine learning algorithms to effectively utilize imperfect synthetic data. The goal of this workshop is to bring in researchers, and discuss potential issues, current trends, and solutions around utilizing synthetic data to develop accurate machine learning solutions.

Topics Covered

Simulation environments

Generative models for data synthesis

Applications of synthetic data to audio/visual problems

Synthetic data augmentation

Synthetic to real domain adaptation

Program Outline (June 19, 2022)

Opening remarks by organizers [5 min] (1:30-1:35pm)
Invited talk by Erroll Wood [35 min] (1:35-2:10pm)
Invited talk by Sanja Fidler [35 min] (2:10-2:45pm)
Coffee break [15 min] (2:45-3:00pm)
Invited talk by Raquel Urtasun [35 min] (3:00-3:35pm)
Invited talk by Jonathan Laserson [35 min] (3:35-4:10pm)
Break [5 min] (4:10-4:15pm)
Panel discussion [35 min] (4:15-4:50pm)
Closing remarks [5 min] (4:50-4:55pm)
NOTE: We do not have paper presentations because we are not accepting papers this year

Speakers

Erroll Wood is a Staff Software Engineer at Google, working on Digital Humans. Previously, he was a member of Microsoft's Mixed Reality AI Lab, where he worked on hand tracking for HoloLens 2, avatars for Microsoft Mesh, synthetic data for face tracking, and Holoportation. He did his PhD at the University of Cambridge, working on gaze estimation.

Sanja Fidler is an Associate Professor at University of Toronto, and a Director of AI at NVIDIA, leading a research lab in Toronto. Prior to coming to Toronto, in 2012/2013, she was a Research Assistant Professor at Toyota Technological Institute at Chicago, an academic institute located in the campus of University of Chicago. She did her postdoc with Prof. Sven Dickinson at University of Toronto in 2011/2012. Sanja finished her PhD in 2010 at University of Ljubljana in Slovenia. In 2010,she visited Prof. Trevor Darrell‘s group at UC Berkeley and ICSI. She got her BSc degree in Applied Math at University of Ljubljana. https://www.cs.utoronto.ca/~fidler.

Raquel Urtasun is the Founder and CEO of Waabi. She is also a Full Professor in the Department of Computer Science at the University of Toronto and a co-founder of the Vector Institute for AI. From 2017 to 2021 she was the Chief Scientist and Head of R\&D at Uber ATG. From 2015-2017 she was a Canada research chair in machine learning and computer vision and a co-founder of the Vector Institute for AI. Prior to this, she was an Assistant Professor at the Toyota Technological Institute at Chicago (TTIC), an academic computer science institute affiliated with the University of Chicago. She was also a visiting professor at ETH Zurich during the spring semester of 2010. She received her Bachelors degree from Universidad Publica de Navarra in 2000, her Ph.D. degree from the Computer Science department at Ecole Polytechnique Federal de Lausanne (EPFL) in 2006 and did her postdoc at MIT and UC Berkeley. She is a world leading expert in AI for self-driving cars. Her research interests include machine learning, computer vision, robotics, AI and remote sensing. She is a recipient of an NSERC EWR Steacie Award, an NVIDIA Pioneers of AI Award, a Ministry of Education and Innovation Early Researcher Award, three Google Faculty Research Awards, an Amazon Faculty Research Award, two NVIDIA Pioneer Research Awards, a Connaught New Researcher Award, a Fallona Family Research Award and two Best Paper Runner up Prize awarded at CVPR in 2013 and 2017 respectively. She was also named Chatelaine 2018 Woman of the year, and 2018 Toronto's top influencers by Adweek magazine.

Jonathan Laserson is the Head of AI Research at Datagen. He earned his PhD in the Computer Science AI lab at Stanford University. After a few years at Google, he ventured into the startup world, mostly building Computer Vision algorithms. Prior to Datagen, he was the lead AI Strategist at Zebra Medical vision, where he led the development of two FDA-approved clinical products, applying neural networks to millions of medical images and unstructured textual reports.

Talks

Speaker: Erroll Wood
Title: Synthetics with Digital Humans
Abstract: Nowadays, collecting the right dataset for machine learning is often more challenging than choosing the model. We address this with photorealistic synthetic training data – labelled images of humans made using computer graphics. With synthetics we can generate clean labels without annotation noise or error, produce labels otherwise impossible to annotate by hand, and easily control variation and diversity in our datasets. I will show you how synthetics underpins our work on understanding humans, including how it enables fast and accurate 3D face reconstruction, in the wild.

Speaker: Sanja Fidler
Title: TBD
Abstract: TBD

Speaker: Raquel Urtasun
Title: Building the next generation autonomous driving solution with Waabi's high fidelity, closed-loop simulator -- Waabi World
Abstract: TBD

Speaker: Jonathan Laserson
Title: Make It Real: Applying StyleGAN On Top of Synthetically Generated Data
Abstract: Neural generators like StyleGAN can generate photorealistic images in many domains after learning their distribution "bottom-up" from large image datasets. While it is possible to manipulate the generated images in various ways, controlling the generated content is a hard task, as it requires reverse-engineering the latent space of the StyleGAN.
At Datagen, we create synthetic visual data – images and videos – using a 3D simulator and a graphics pipeline. We have full control over many aspects of the content we generate. When we generate a person, we can choose the desired ethnicity, expression, hair style, lighting conditions, and accessories (i.e glasses, hats). Although our 3D artists and engineers strive to make the generated faces and scenes as diverse and realistic as possible, there is often a domain gap between the level of diversity and photorealism we can achieve using the "top-down" graphics pipeline and real-world photos.
To bridge the domain gap, we propose to first produce an initial version of the desired image using the top-down synthetic pipeline, and then invert this image into the latent space of a StyleGAN trained on real images. We show that the inversion maintains the same person identity, but adds photorealism and provides access to new modes of diversity. It enables us to generate synthetic, photorealistic image datasets that can be used to train computer vision models, such as face recognition, while retaining full control over the distribution of the data.

Panel

Errol Wood: Staff Software Engineer at Google.
Sanja Fidler: Associate Professor at University of Toronto, and a VP of Research at NVIDIA.
Jonathan Laserson: Head of AI research at Datagen.
Mike Roberts: Research Scientist at Intel.
Kwang Moo Yi: Assistant Professor at University of British Columbia (UBC).
Ankur Handa: Research Scientist at Nvidia.
Javier Romero: Research Scientist at Meta.

Organizers

Oncel Tuzel is a principal researcher and research manager at MIND team in Apple. He received his Ph.D. from the computer science department at Rutgers University in 2008. His research interests are broadly in machine learning and robotics, particularly focusing on generative models and simulations to improve sample and computational efficiency of learning algorithms. He has co-authored over 70 peer-reviewed publications and holds over 50 US and international patents. His work has received the best paper award in 2017 CVPR, the best paper runner-up award in 2007 CVPR, and the 2014 R&D 100 award -- awarded to 100 most innovative technology introduced in 2013.

Ashish Shrivastava is a tech lead and applied research scientist at Cruise, where his focus is on utilizing synthetic data to train/test ML models for autonomous vehicles. Prior to Cruise, he was a research scientist at Apple where he conducted research in the area of machine learning focused on computer vision, speech, and robotics applications. He has done extensive research on synthetic data, generative models, domain adaptation, and unsupervised learning. His work on using synthetic and unsupervised data won the best paper award at CVPR’17. He received his PhD from University of Maryland, College Park under the supervision of Prof. Rama Chellappa.

Russ Webb received a PhD from Cornell University in Electrical Engineering in 2000 studying micromechanics, shipped Palm OS 4.0 as technical lead for the OS, advanced neuroscience at the Redwood Institute, created a machine learning research group at the University of Canterbury, and innovated in the area of photography at Apple. Currently, he conducts and leads research in the areas of representation learning and reasoning at Apple.

Ming-Yu Liu is a Distinguished Research Scientist and Manager at NVIDIA Research. He received his Ph.D. from the Department of Electrical and Computer Engineering at the University of Maryland College Park. He won the R&D 100 Award by R&D Magazine for his robotic bin picking work in 2014. In SIGGRAPH 2019, he won the Best in Show Award and Audience Choice Award in the Real- Time Live track for his GauGAN work. His GauGAN work also won the Best of What’s New Award by the Popular Science Magazine in 2019. His research interest is on generative image modeling. His goal is to enable machines human-like imagination capability.

Arun Mallya is a Senior Research Scientist at NVIDIA Research. He obtained his Ph.D. from the University of Illinois at Urbana-Champaign in 2018, with a focus on performing multiple tasks effi- ciently with a single deep network. He holds a B.Tech. in Computer Science and Engineering from the Indian Institute of Technology - Kharagpur (2012), and an MS in Computer Science from the Uni- versity of Illinois at Urbana-Champaign (2014). His interests are in generative modeling and enabling new applications of deep neural networks.

Aysegul Dundar is an Assistant Professor of Computer Science at Bilkent Univesity, Turkey and a Sr. Research Scientist at NVIDIA. She received her Ph.D. degree at Purdue University in 2016, under supervision of Professor Eugenio Culurciello. She received a B.Sc. degree in Electrical and Electronics Engineering from Bogazici University in Turkey, in 2011. Her current research focuses are on domain adaptation and generative models for image synthesis and manipulation.

Sofien Bouaziz is currently Head of XR Presence at Facebook Reality Labs. Prior to this, he was a research scientist and group manager at Google, where he lead and managed a large perception team working on cutting edge AR research and development. His team shipped multiple successful technologies, e.g., depth reconstruction for the Pixel 4 depth sensor, ARCore depth API, Pixel 5 portrait relighting, TensorFlow Graphics. Prior to Google, he was a principal research scientist at Apple where he designed, developed, and productized the realtime face tracking algorithm powering the iPhone X Animojis and also available to third-party developers through ARKit. His research interests include machine learning, computer vision, and computer graphics. He completed my PhD degree in 2015 in the Computer Graphics and Geometry Laboratory (LGG) at the Swiss Federal Institute of Technology in Lausanne (EPFL).

Simulation

Generative Models

Domain Adaptation

Data Augmentation