Illuminate is an experimental project from Google that generates accessible, podcast-style interviews from academic papers:
Illuminate is an experimental technology that uses AI to adapt content to your learning preferences. Illuminate generates audio with two AI-generated voices in conversation, discussing the key points of select papers. Illuminate is currently optimized for published computer science academic papers.
The service has a waitlist, but you can try out some generated conversations (and I recommend that you do!). The enthusiasm, intonation, and ums & ahs are convincing and feel authentic to the genre that the project mimics. (See also the PDF to Podcast project which does similar things but with flatter voice results.)
But it’s not the seeming authenticity that feels important here. Machine-generated voices—even at this level of fidelity—are nothing new. What’s more interesting is how this project demonstrates what large language models (and now large multimodal models) are truly great at: they are prodigious translators and transformers of symbols, whether those symbols are for language, visuals, or broad concepts. These models can shift those symbols nimbly among formats: from English to Chinese to structured data to speech to UI components to audio to image. These are systems that can understand a concept they are given and then work their alchemy to present that concept in a new medium or language or format.
There are exciting opportunities here for unlocking content that is trapped in unfriendly formats (where the definition of “unfriendly” might be unique to the individual). This application leans into what generative AI is good at (understanding, transforming) around tightly scoped content—and avoids what these models are uneven at: answering questions or building content from scratch. How might this kind of transformation support education efforts, particularly around accessibility and inclusivity?