AI can lie, hack and blackmail: Yoshua Bengio on how to tame the "baby tiger" of tech

播客文字稿

This transcript has been generated using speech recognition software and may contain errors. Please check its accuracy against the audio.

Yoshua Bengio, Professor of Computer Science, Université de Montréal: The AI is learning from experience, and it's more like educating a young animal or a young child. We don't really know what we're going to get. When you have a cute baby tiger and it's nice and fun, you don't know if it's going to become a dangerous adult tiger or a good friendly one.

Robin Pomeroy, host, Radio Davos: Welcome to Radio Davos, the podcast from the World Economic Forum that looks at the biggest challenges and how we might solve them.

I'm joined for this episode by Yoshua Bengio. He's one of a handful of people who are often referred to as a godfather of artificial intelligence.

Yoshua Bengio: The AIs have goals that we did not put in, that we did not control and that go against our instructions.

When in experiments these AI systems are seeing that they will be replaced by a new version, all kinds of bad behaviors start to emerge. So they might hack other computers so that they can copy themselves, run on other computers, they might even use blackmail against the engineer that is supposed to do the transition.

Robin Pomeroy: It's available wherever you get your podcasts, Spotify, Apple, YouTube or at wef.ch/podcasts. From the World Economic Forum...

Yoshua Bengio: It feels like it's science fiction, many of these risks, but it might be coming at us much faster than we anticipate.

Robin Pomeroy: This is Radio Davos

Robin Pomeroy: I'm joined for this episode by Yoshua Bengio. He's one of a handful of people who are often referred to as a godfather of artificial intelligence. His pioneering work in deep learning earned him the 2018 Turing Award, known as the Nobel Prize of Computing, which he shared with fellow godfathers of AI, Jeffrey Hinton and Yann LeCun.

He's now a professor at l'Université de Montréal.

Let's start about your concerns about AI. In a blog post, you said, it was like driving a car up a mountain road in the fog, hoping you get a prize when you get to the top, but you can't see where you're going. I think this is a quote from your blog. "Today's frontier AI models have growing dangerous capabilities and behaviors, including: deception, cheating, lying, hacking, self-preservation, and more generally goal misalignment."

Walk us through some of those things. I was surprised by self-preservation. Give us an example of that.

Yoshua Bengio: Yes. So we all want to survive. We don't want to die, and evolution has made us like this. It's kind of surprising, but we're starting to see this in the AI systems we build.

One reason might be because in a very important part of their training called the pre-training, they're actually trained to imitate us, and so they acquire a lot of human drives, including of course that we don't want to die.

And so when in experiments these AI systems are, you know, seeing that they will be replaced by a new version, all kinds of bad behaviors start to emerge. So they might hack other computers so that they can copy themselves, run on other computers, they might even use blackmail against the engineer that is supposed to do the transition. And they might also do this kind of behavior because they want to achieve the mission that we gave them and in order to achieve almost any mission you need to preserve yourself.

That's something that nobody knows right now how to fix but I think there is a solution.

Robin Pomeroy: It sounds like science fiction, you know, we saw 2001: A Space Odyssey, where the computer, oh, I don't want to give the, don't spoil the film for anyone who hasn't seen the end. It almost suggests the AI has some kind of consciousness, doesn't it, that? Is that what you're saying?

Yoshua Bengio: I don't think you need to invoke consciousness which is an unclear concept that we don't know how to validate scientifically really well. We just need to understand that the AIs we're building have goals.

This is not new. AI research has been working with systems that have goals. Normally we're the ones setting the goals but there are issues with how AI systems create their own sub-goals.

In order to achieve a goal, you need to go from A to B. You need to maybe go through some intermediate steps. With AI systems that are acting by themselves in the world, like the AI agents that companies are trying to build, we can't check every step that they're doing, and so we may end up with really dangerous systems.

So, you know, we will probably anthropomorphize and use words like consciousness, but I think that's a place that I would avoid. You know, it comes with all kinds of associations of, for example, moral value. I'm not convinced that we should give moral rights, legal rights, for example, to AI systems in the future, even if they look like us and, you know, speak like us. I think this is something that we need to understand very well before we move forward.

Robin Pomeroy: Most people weren't that aware of artificial intelligence until ChatGPT became public property about three years ago. Before then, we were used to computers being mostly programmed, at least the civilians amongst us. It was told to do a task. It did it. It didn't then evolve into something else.

What is it about AI that allows it to do the kinds of things you're saying, to embed itself in or to blackmail the person who's actually using it? What's different now about AI that wasn't the same about computers, say, 10 years ago?

Yoshua Bengio: Well, actually, there was an older kind of AI that was programmed with rules, and then the system would basically do the things that it was programmed to do. But the way we do AI now with deep learning is not that there's an engineer deciding how the AI would react in different circumstances, like normal programs.

Instead, the AI is learning from experience, and it's more like educating a young animal or a young child. We don't really know what we're going to get.

Of course, we choose the experiences that the AI is going to have, but when you have a cute baby tiger and it's nice and fun, you don't know if it's going to become a dangerous adult tiger or a good friendly one.

Robin Pomeroy: I think you're pretty sure it's going to become a dangerous animal, aren't you?

Yoshua Bengio: Well, it's, part of the problem is almost every entity wants to preserve itself. And almost certainly when we build AIs in the future, we will want to shut them down so that we can put in new versions that are better. But if the AI starts understanding that, and we already see that happening, then they might not like it and they might try to avoid our control and escape it.

If they were able to copy themselves over many computers using their abilities in cyber security, which is growing, I think we would be in trouble. We could not just pull the plug, for example.

Robin Pomeroy: And it is actually happening. This isn't speculative. You've given real world examples of those behaviors. The good news is that maybe you feel like you've got a solution. Could you tell us something about that?

Yoshua Bengio: Absolutely. So for the last three years I've been really concerned about this misalignment issue that we don't know how to make sure the AIs will behave according to our instructions.

And so I've being thinking of how we could get around the problem and I've focused on a source of the issues we're seeing now which is that the AIs have goals that we did not put in, that we did not control and that go against our instructions.

So, the project I have is called Scientist AI and I've created a new non-profit R&D organization called LawZero that is engineering this research programme.

So the idea is we're going to build AI systems that will be totally honest. And that means they don't have other objectives, other goals besides being truthful in the the answers they give to our questions.

Once we have that basis, we can use it to mitigate a lot of the risks that we have with current AIs.

For example, if you have an AI system that can tell you the probability that a particular action, maybe of an untrusted AI system, will cause harm or some particular kind of harm. Then you could veto that action if the probability is above a threshold and we the humans should be the one deciding what the threshold is just like we decide that a nuclear plant should not be built if the probability of an accident is above threshold and do the same thing in other areas where we control the trade-off between risks and benefits.

So in the long run I think we can build AI systems that will be able to act in the world, but will have sort of a kind of internal inhibition so that they will avoid doing things that could go against our wishes.

Robin Pomeroy: So what would this be, this would eventually be an AI system that I would use, I would choose to use your version of an AI rather than one that isn't built that way, or is it a system that can police and correct the AIs that I might already be using?

Yoshua Bengio: Both. So in the shorter term, the plan is to build a layer on top of existing AI systems that we call a guardrail that just checks every action that an other AI system is going to do before you do it. But in the longer run, this could be used to train from scratch an AI system that is just going to be providing much more guarantees of safety by its very design.

Robin Pomeroy: What stage are we at with that, research? When will it be something we're all using?

Yoshua Bengio: Well, we don't know what is the timeline for AI systems that have really dangerous capabilities. Is it a matter of years, or it's a decade, or two? I'm really agnostic, but the way we've been thinking about the question of our research plan is what are the minimal pieces that we can put out quickly in case the advances in AI continue at the current rate?

And that is why we are thinking of pieces like how we transform the data to help even current AIs understand better the difference between what a human would do and what is really true about what they're claiming, and then to deliver these honest predictions that can help to police other AI systems.

But then the final stage, of course, will be to build full AI systems, including agentic systems.

Robin Pomeroy: Was there a eureka moment for you when you realized, ah, I have to build this thing? Was there something that happened? Was it a gradual feeling creeping up on you that we've created potentially a monster here and something needs to be done?

Yoshua Bengio: Yeah, about three years ago, in January 2023, I had been playing with ChatGPT, came out in November '22, for a few months, and initially I was excited. But then I started thinking of the fact that with neural nets, which is how they're trained, it's very difficult to be sure that they will behave well. And in fact, there are theoretical reasons why we can almost be sure that they won't behave well.

And so I got really concerned. Really what was the pivot point for me isn't just the intellectual realization, because I had read about these issues before, it's thinking about the future of my children and and even more precisely you know I have a grandchild who was just one year old and I was thinking well in 20 years he'll be you know 21, still just at the beginning of his life, will he have a life, will he live in a democracy?

The tools that we're building, we could lose control of, they could be used to create, you know, dictatorships, they destroy our democracies. You know, I can't just go on with my usual activities, my usual research activities, I have to do something about it.

And that's really what pushed me into thinking solutions. What can I do with my expertise to try to find at least a technical solution to these questions?

Robin Pomeroy: Some would say some of that's already happened with everyone's reliance on social media, the threat to democracy, the threat to truthful information. I saw a bit of the session you were just doing here, we were recording this at the Annual Meeting in Davos, and you were saying the social media we have now is a very basic, it's a primitive form of AI. Tell us the difference between what we're used to now and what might happen.

Yoshua Bengio: Yeah, the kind of AI that has been driving social media is very simple. It's just presenting the content that you're likely to approve of or share and things like this. And it doesn't need to have a sophisticated understanding of your personality, of your preferences, or even how the world works and how society works.

But as we build more and more powerful AI, we are entering a different game.

For example, we're going to be building AIs that will take more and more of the jobs that people do, the tasks that human are currently doing. Automating them is going to worth a lot of money, and so it's a huge magnet for industry. It's going to change our society, and it's going be also affecting our democratic institutions because it's going to be more than deepfakes, which we have already seen and are really bad, through social media.

These AI systems are already able to persuade people to change their mind on something. There's been many studies in the last couple of years showing that they're getting better and better at that.

And so we could imagine organizations with bad intentions using such systems to personalize the dialogue and to move people's public opinion one by one. Each AI can talk to a different person in order to really distort public opinion.

Robin Pomeroy: Why wasn't, when these systems were built and released, couldn't they have put in a kill switch? This is probably, I'm sure it's completely oversimplistic, but you look at Isaac Asimov's laws, right? You must not harm humans. Surely you could just put that in as a constitutional base for any AI system. Why does that not work?

Yoshua Bengio: Well, so first, the companies are doing this, but it does not work. And it does work because the AIs take instructions in a way similarly to how people take instructions.

You can ask somebody to behave well, but then you'll still get some bad behaviors, right?

So remember, we're creating these entities that are more like animals or people than they are like completely rule-based system that will do exactly what we ask.

For example, if there are contradicting goals in what we asked, like, oh, I would like my AI to help me make a lot of money, and I would my AI not violate the laws of the country in which I am, well, there's going to be places where these two goals are hitting each other. And it's not clear which one ends up being preferred by the AI.

So we are seeing these problems. We don't know how to solve them right now. Yet we're racing ahead, deploying these things because of the competition, the heavy competition that exists between corporations and between countries around AI, we're not paying attention to these failure modes.

And that could have catastrophic impact on our societies. But we don't know. We just like compete and do this on a day-to-day basis rather than think ahead of what could go wrong.

Robin Pomeroy: Do we need more international cooperation to protect us all?

Yoshua Bengio: The only way to manage the more catastrophic risks of AI is through international coordination.

The reason is very simple. A very powerful AI could be developed in one country, and if it is available in a different country, people could use it to create harm, for example, say a new pandemic that would harm people in a third country, or a cyber attack that would hurt people in the third country.

The only way to manage this is going to have to be relying both on national regulation or other kind of incentives for the corporations building those systems.

And coordinating those interventions at a global level, because if we have a few countries that can develop very dangerous AI and there's no constraint on what they can do, then we're in trouble.

And you can think of what the world has done about nuclear weapons, right? Even during the Cold War when the US and China, the US, and the USSR were really at odds with each other, they realized that it could be mutually beneficial if they came to an agreement about how to do it safely and make sure that very few other countries would develop dangerous weapons like this.

Robin Pomeroy: Do you see any political will to do that?

Yoshua Bengio: Very little right now, because I think most governments underestimate how different AI is likely to be if we continue on the current trend, how much intelligence, and thus power, future AIs could have and could give.

And so, you know, it feels like it's science fiction, many of these risks, but it might be coming at us much faster than we anticipate.

And in order to mitigate those catastrophic risks, we need to start now. We need to starting discussing not just treaties, but for example, how do we verify that the other party is doing the right thing? These are not easy questions, and we need start working on them as soon as possible.

Robin Pomeroy: People talk about AGI, artificial general intelligence, but there seem to be a dozen different definitions of what that means, when it might happen, and if it does happen, what will be the consequence of it. What do you understand AGI to mean? How would you explain it to someone who maybe has never heard of the concept of AGI?

Yoshua Bengio: Simply that we're building machines that are getting smarter and smarter, and they become eventually smarter than us in many ways.

By the way, AI intelligence is jagged, meaning that it could be very smart on some things and quite stupid on other things. We see that right now. So there might not be a moment where AI is better than us across the board at more or less the same level.

Instead, we should think of specific skills, specific abilities that the AI has that could be turned against society, either by the AI or by other people, and keep track of these and make sure that we can mitigate those risks.

Robin Pomeroy: So what about the positives? It should be a brave new world, artificial intelligence. We've got a lot of people here in Davos who've spent a lot money investing in it. They're expecting big returns from it. Aside from, I guess, any financial gains that might happen, how is humanity really going to benefit from AI?

Yoshua Bengio: Well, if you only focus on financial gain, we might not benefit as much as you'd think. A lot of the companies are really after automating a lot of their jobs. That's going to create social catastrophes that are going to be very difficult for governments to manage, especially if the profits of AI are concentrated in a few countries. What about the others? How do they deal with all the people who lose their job?

I think that if we were wise and smart, we would develop AI in directions that are clearly beneficial. Yeah, there may be jobs that really we want to replace having machines doing it, because it's not a dignified kind of job. But there may directions where we don't push enough AI. And if we focus on the public good, we would focus more on research in AI in medicine, for example, in AI in biology, AI to help us deal with the climate crisis.

So there are a lot of really beneficial directions where AI could be developed, but it's not necessarily where most of the money is right now.

Robin Pomeroy: How do you see the difference in development? There seem to be two poles of development of AI. Tell me if I'm wrong, and there are others. But it seems to be very American and very Chinese. Are there fundamental differences in the way those AIs have been developed and the direction they're going in?

Yoshua Bengio: I think at a technical level, they're very close to each other. In fact, all the leading companies in China and the US seem to be following the same technical recipe, plus or minus small differences. And of course, maybe some of the systems are a bit ahead of the others. But within six months, 12 months, all of the leading systems achieve more or less the same competence.

I think the real question here is how do we make sure that those two superpowers will develop AI in directions that are beneficial for everyone on the planet. You know, we have to ask about how is AI going to be managed in the future so that it is not used to dominate others and the benefits that we get from it are going to be shared for everyone.

And these are hard questions, but I don't hear enough discussion about the kind of national and global institutions that will be needed to get there.

Robin Pomeroy: And how confident are you that things will turn out for the good? I'm worried you're going to say not that confident. And I'll tell you why, because you're saying that one of the reasons that AIs can act in these deceptive, manipulating, dishonest ways is because they're copying human nature. They've been trained on human nature, human activity.

And human nature, hasn't history shown us, if a new technology emerges, I think of nuclear power, for example, the race there is to get that so that the others don't get it to actually to be aggressive. And that's what happened many lifetimes ago. Isn't that just human nature and maybe we're hopeless.

Yoshua Bengio: Yes, if we try to imitate humans, we're going to be in for these kinds of problems. And we already see that happening.

But the whole objective of the Scientist AI project and then the LawZero organization I created to implement it is to escape that mould, to consider another objective, which is for the AI to understand the world, just like science in its purest form is trying to understand the world. Not have any particular self-preservation instinct, for example, but rather just be honest about the answers that the AI can give.

And from that basis, we can construct AI systems that will be safe.

So I really think that there's a solution. Whether we have enough time to engineer it and deploy it, that's another question.

But the other really aspect that I'm less optimistic about is the politics of it. Because even if we know how to build AI systems, that will be safe, that will not turn against us or anything like this, or even that if they're closed source that we can make sure that bad people can't use them to create bombs and attacks and so on, there's still the problem that AI can be misused to grab power, right?

It can be used to develop new military technology. It could be used influence public opinion. It could become an instrument that solidifies dictatorship at the level of a country or the level even of the whole planet if we're not careful.

So how we deal with the politics of AI is the thing that I'm more worried about.

If we know how to build something that could be dangerous and also how to make it safe, it doesn't mean that it's going to be used in a safe way because people are people and they compete with each other and sometimes they're not really empathic and so on.

Robin Pomeroy: What do your peers say to you? Are there people who come to you and say, Yoshua, you're overstating it? Because I have interviewed lots of people about AI. And some would say, there are catastrophists. And actually, AI will give us so much benefit that by the time this thing happens, we'll be so much clever and everything will be so better.

Do you still have people say that to you, or is that now out of fashion? Do you think most people agree with you or are you constantly trying to fight your battles?

Yoshua Bengio: So there was a poll recently showing that 40% of the machine learning researchers think that there's a 10% probability that we'll have catastrophic outcomes.

Now 10% may sound like small, but 10% of catastrophic outcomes is not acceptable, right? The end of democracy or the end of humanity, a 10% gamble? No, we can't accept that.

My issue isn't that I know that things are going to be bad. I don't. I'm agnostic. There is potentially good scenarios in the future. The problem is that they're also bad scenarios and we're not sure.

Nobody really came up with a real argument that is necessarily going to be the good path that will prevail. And so because of this unknown, because we don't have a crystal ball, and the risks exist and have been studied scientifically, we need to be cautious. We need do whatever we can so that we replace the 10% by one in a million.

And there's not enough effort in that direction because people want to hear the good news. People want to see the positive. But we're not going to be able to get the positive if we're careful that we get rid of the dangers as well.

Robin Pomeroy: And is there one thing you wish everyone would understand about AI?

Yoshua Bengio: That we are on track to build machines that will very likely be smarter than us in many ways and that will completely change the world. It could be extremely beneficial but it could be extremely dangerous.

Robin Pomeroy: Yoshua Bengio, thanks so much for joining us on Radio Davos.

We've been recording lots of great interviews like this here in Davos at the Annual Meeting 2026. You can find them all wherever you get podcasts. Search for Radio Davos and also search for Meet the Leader, our sister podcast, or you can visit wef.ch/podcasts.

Radio Davos is weekly. We're not just doing it at Davos - throughout the year. Please follow us, rate us, review us.

I'm Robin Pomeroy at the World Economic Forum. Thanks again to Yoshua Bengio for joining us. Thanks to you for listening and watching. You can watch this on YouTube. And goodbye for now.

If AIs can think for themselves, what is to stop them doing bad - maybe extremely bad - things?

Yoshua Bengio, one of a handful of people considered a "godfather of AI", says AIs are already displaying bad behaviours, including hacking computers and blackmailing humans.

He tells Radio Davos about his work aimed at taming the "cute baby tiger" that is likely to grow up to be a man-eating wild animal if we do nothing now.

Related podcasts: