What an Endless Conversation with Werner Herzog Can Teach Us about AI

An AI-generated conversation between Werner Herzog and Slavoj Žižek is definitely entertaining, but it also illustrates the crisis of misinformation beginning to befall us

By Giacomo Miceli

An illustration of director Werner Herzog and Slovenian philosopher Slavoj Žižek. — John Cuneo

On the website The Infinite Conversation, German filmmaker Werner Herzog and Slovenian philosopher Slavoj Žižek are having a public chat about anything and everything. Their discussion is compelling in part because these intellectuals have distinctive accents when speaking English and a tendency toward eccentric word choices. But they have something else in common: both voices are deepfakes, and the text they speak in those distinctive accents is being generated by artificial intelligence.

I built this conversation as a warning. Improvements in what’s called machine learning have made deepfakes—incredibly realistic but fake images, videos or speech—too easy to create and their quality too good. At the same time, language-generating AI can quickly and inexpensively churn out reams of text. Together these technologies can do more than stage an infinite conversation. They have the capacity to inundate us with a deluge of disinformation.

Machine learning, an AI technique that uses large quantities of data to “train” an algorithm to improve as it repetitively performs a particular task, is going through a phase of rapid growth. This is pushing entire sectors of information technology to new levels, including speech synthesis, systems that produce utterances that humans can understand. As someone who is interested in the liminal space between humans and machines, I’ve always found it a fascinating application. So when those enhancements in machine learning allowed voice-synthesis and voice-cloning technology to advance by giant leaps in recent years—after a long history of small, incremental improvements—I took note.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

We’re about to face a crisis of trust, and we’re utterly unprepared for it.

The Infinite Conversation got started when I stumbled across an exemplary speech-synthesis program called Coqui TTS. Many projects in the digital domain begin with finding a previously unknown software library or open source program. When I discovered this tool kit, accompanied by a flourishing community of users and plenty of documentation, I knew I had all the necessary ingredients to clone a famous voice.

As an appreciator of Herzog’s work, persona and worldview, I’ve always been drawn to his voice and way of speaking. I’m hardly alone, as pop culture has made Herzog into a literal cartoon: his cameos and collaborations include The Simpsons, Rick and Morty and Penguins of Madagascar. So when it came to picking someone’s voice to tinker with, there was no better option—particularly because I knew I would have to listen to that voice for hours on end.

Building a training set for cloning Herzog’s voice was the easiest part of the process. Between his interviews, voice-overs and audiobook work, there are hundreds of hours of speech that can be harvested for training a machine-learning model—or, in my case, fine-tuning an existing one. A machine-learning algorithm’s output generally improves in “epochs,” which are cycles through which the neural network is trained. The algorithm can then sample the results at the end of each epoch, giving researchers material to review so they can evaluate how well the program is progressing. With the synthetic voice of Herzog, hearing the model improve with each epoch felt like witnessing a metaphorical birth, with his voice gradually coming to life in the digital realm.

Once I had a satisfactory Herzog voice, I started working on a second voice and intuitively picked Žižek. Like Herzog, Žižek has an interesting accent, a relevant presence within the intellectual sphere and connections with the world of cinema. He has also achieved a popular stardom, in part thanks to his polemical fervor and sometimes controversial ideas.

At this point, I still wasn’t sure what the final format of my project was going to be—but I was surprised by how easy and smooth the process of voice cloning was. As noted, deepfakes have become too good and too easy to make. In 2023 Microsoft announced a new speech-synthesis tool called VALL-E that, researchers claim, can imitate any voice based on just three seconds of recorded audio. We’re about to face a crisis of trust, and we’re utterly unprepared for it.

Deep Thoughts

A brief exchange from an endless and completely fake conversation

Slavoj Žižek*: So where do you see the possibility for a radical change today? How do you evaluate the present situation in terms of revolutionary potentials?

Werner Herzog: I don’t know. I think there is no real potential for a revolution. Not in our society and not in the foreseeable future. There is no real potential. I think we should put our energy elsewhere. We should work on the next stage of human evolution. I have given some talks in Silicon Valley, and they are convinced that the next step in evolution is to make humans artificially intelligent. They see it as inevitable. Of course, it will have to be widely discussed and prevented at the U.N. But I am convinced that the next step lies there. The next step in inner evolution lies there. Look at you! You are constantly using your cell phone, and you look like a mountain sheep, completely lost. And yet, in one second, you can find out every single detail about anything. And you’re constantly updating your knowledge. You’re constantly doing research. And this goes on and on and on. In 10 years we’ll be totally different from what we are now. So that’s where we need to focus our energy. The next stage of inner evolution. That’s where it is.

Žižek: So you think we should organize a sort of “pro–artificial intelligence” movement? What would you like to achieve?

Herzog: I have no clear goal. I have a dream of sorts. And yet, at the same time, I am also profoundly happy. And that’s what cinema, literature and music should do. Rather than just organizing a movement. Just filmmaking. Just making music. That’s what we should do.

Žižek: Cinema, yes. Cinema for the machines. Yes, great idea.

*The two conversants are AI chatbots trained to imitate Žižek, the philosopher, and Herzog, the filmmaker.

To emphasize this technology’s capacity to produce ample quantities of disinformation, I settled on the idea of a never-ending conversation. I needed only a large language model—fine-tuned on texts written by each of the two participants—and a simple program to control the flow of the conversation so that it would feel natural and believable.

Given a series of words, language models predict the next word in a sequence. By fine-tuning a language model, it is possible to replicate the conversational style of a specific person, provided you have abundant transcripts of that person talking. I decided to use one of the leading commercial language models available. That’s when it dawned on me that it’s already possible to generate a fake dialogue, including its synthetic voice form, in less time than it takes to listen to it. This realization provided me with an obvious name for the project: the Infinite Conversation. After a couple of months of work, I published it online in October 2022. In 2023 the Infinite Conversation was selected to be part of the Misalignment Museum art installation in San Francisco.

Once all the pieces fell into place, I marveled at something that hadn’t occurred to me when I started the project. Like their real-life personas, my chatbot versions of Herzog and Žižek often talk about philosophy and aesthetics. Because of the esoteric nature of these topics, the listener can temporarily ignore the occasional nonsense that the model generates. For example, AI Žižek’s view of Alfred Hitchcock alternates between seeing the famous director as a genius and as a cynical manipulator; in another inconsistency, the real Herzog notoriously hates chickens, but his AI imitator sometimes speaks about the fowl compassionately. Because actual postmodern philosophy can come across as muddled—a problem Žižek himself has noted—the lack of clarity in the Infinite Conversation can be interpreted as profound ambiguity.

This probably contributed to the success of the project. Hundreds of the Infinite Conversation’s visitors have listened for more than an hour, and some people have tuned in for much longer. As I mention on the website, my hope for visitors of the Infinite Conversation is that they not dwell too seriously on what the chatbots are saying. Instead I want to give people an awareness of this technology and its consequences. If this AI-generated chatter seems plausible, imagine the realistic-sounding speeches that could be used to tarnish the reputations of politicians, scam business leaders or simply distract people with misinformation that sounds like human-reported news.

But there is a bright side. Infinite Conversation visitors can join a growing number of listeners who report that they use the soothing voices of Werner Herzog and Slavoj Žižek as a form of white noise to fall asleep to. That’s a usage of this new technology I can get behind.

This is an opinion and analysis article, and the views expressed by the author or authors are not necessarily those of Scientific American.