Dr. Aayush Bansal is the Principal Scientist and Head of Research at SpreeAI (https://www.spreeai.com/), an upcoming startup on virtual tryon. He earned a Ph.D. in Robotics from Carnegie Mellon University under the supervision of Prof. Deva Ramanan and Prof. Yaser Sheikh. He was a Presidential Fellow at CMU, and a recipient of the Uber Presidential Fellowship (2016-17), Qualcomm Fellowship (2017-18), and Snap Fellowship (2019-20). His research has been covered by various national and international media such as NBC, CBS, WQED, 90.5 WESA FM, France TV, and Journalist. He has previously worked with Reality Labs Research at Meta Platforms Inc. and Adobe Research. He has also worked with production houses such as BBC Studios, Full Frontal with Samantha Bee (TBS), etc. He serves on the senior program committee of prestigious academic conferences and journals such as IEEE/CVF CVPR, ICCV, NeurIPS, SIGGRAPH, etc. His work has been awarded with multiple awards and citations. More details about him are available on his webpage: https://www.aayushbansal.xyz/.
SpreeAI (https://www.spreeai.com/)
March 27, 2024
Deepfakes: The good and ugly side of Artificial Intelligence
In this talk, I will tell you everything about deepfakes, that is a story from the "success" of artificial intelligence and machine learning. I will primarily talk about two methods: one for video deepfakes, and other for audio deepfakes. Firstly, I will introduce Recycle-GAN that combines spatial and temporal information via adversarial losses for unsupervised video retargeting. This representation allows us to translate the contents from one domain to another while preserving the style native to the target domain. E.g., if our goal is to transfer the contents of John Oliver’s speech to Stephen Colbert, then the generated content/speech should be in Stephen Colbert’s own style. Secondly, I will introduce Exemplar Autoencoders that project out-of-sample data onto the distribution of the training set. We use Exemplar Autoencoders to learn the voice and stylistic prosody (emotions and ambiance) of a specific target exemplar speech. This work enables us to synthesize a natural voice for speech-impaired individuals and do a zero-shot multi-lingual translation. However, the technology can be misused as well for political propaganda and bullying. I will end my talk with a discussion about the negative aspects of this technology and the different measures that we should take to overcome the ugly side of deepfakes.