An AI-generated argument between two prominent philosophers and intellectuals, Werner Herzog and Slavoj Žižek, is both entertaining and concerning. The deepfaked conversation, which was generated using advanced language models and voice synthesis technology, highlights the increasing threat of misinformation that our society faces.
The use of machine learning and artificial intelligence has rapidly grown in recent years, allowing for significant advancements in various fields, such as voice synthesis and deepfake technology. Deepfakes, which are incredibly realistic but fake images, videos, and speech, have become too easy and convincing to produce. The rapid increase in machine learning and the sheer amount of information being generated by language-generating AI is leading to a major problem of disinformation spreading and overwhelming us.
The Infinite Conversation was created using Coqui TTS, a new software library or open-source program that has kickstarted many digital projects. This tool package, along with a vibrant community and ample documentation, made it possible for the creator of the project to clone the voices of Herzog and Žižek.
The creator of the project had a fascination with Herzog’s voice and worldview and decided to start with cloning his voice. Herzog has a distinctive dry German accent and words that are imposing, making him a prime candidate for voice cloning. With the abundance of interviews, voice-overs, and audiobooks available, the creator was able to gather hundreds of hours of speech for training the machine-learning algorithm. The algorithm was trained using “epochs” of neural network training with all the available training data, allowing for incremental improvements with each iteration. The result was a synthetic voice of Werner Herzog that improved with time.
After Herzog, the creator chose to clone the voice of Slavoj Žižek, who is famous for his polarizing views and unique vernacular. Like Herzog, Žižek also has a particular intellectual presence and film links that made him an interesting candidate for voice cloning. The process was much the same as for Herzog, with the machine-learning algorithm being trained on hundreds of hours of Žižek’s speech.
The ease and accuracy of the voice-cloning technology was surprising, with researchers from Microsoft launching VALL-E, a speech synthesis technology that can imitate any voice from just three seconds of audio. This development, along with the ease of producing deepfakes, has raised concerns about the potential exploitation of realistic-sounding statements to ruin reputations, scam leaders, or distract the public with fake news.
The Infinite Conversation was created to demonstrate the misleading proliferation of these technologies. The creator used a large language model and a basic algorithm to make the conversation flow effortlessly. Language models work by anticipating words based on a sequence, and by having enough conversation transcripts, a language model can be fine-tuned to match a particular style and topics. The result of the project was an AI-generated conversation between Herzog and Žižek that discusses aesthetics and philosophy.
The model’s outputs are often obscure and nonsensical, but listeners are still fascinated by the chatbots’ discussions. AI Žižek sees Alfred Hitchcock as both a genius and a cynical manipulator, and while Herzog detests chickens, his AI imitator sometimes speaks affectionately about them. The ambiguity of the Infinite Conversation, which reflects the confusing nature of postmodern philosophy, has led to some listeners spending over an hour listening to the chatbots’ discussions.
The website for the Infinite Conversation states that the creator’s aim is for visitors to learn about the technology and its effects without taking the chatbots’ words too seriously. The project has been a surprising success