Why Asimov’s Laws of Robotics Don’t Work – Computerphile


So, should we do a video about the three laws of robotics, then? Because it keeps coming up in the comments. Okay, so the thing is, you won’t hear serious AI researchers talking about the three laws of robotics because they don’t work. They never worked. So I think people don’t see the three laws talked about, because they’re not serious. They haven’t been relevant for a very long time and they’re out of a science fiction book, you know? So, I’m going to do it. I want to be clear that I’m not taking these seriously, right? I’m going to talk about it anyway, because it needs to be talked about. So these are some rules that science fiction author Isaac Asimov came up with, in his stories, as an attempted sort of solution to the problem of making sure that artificial intelligence did what we want it to do. Shall we read them out then and see what they are? Oh yeah, I’ll look them- Give me a second. I’ve looked them up. Okay, right, so they are: Law Number 1: A robot may not injure a human being or, through inaction allow a human being to come to harm. Law Number 2: A robot must obey orders given it by human beings except where such orders would conflict with the first law. Law Number 3: A robot must protect its own existence as long as such protection does not conflict with the first or second laws. I think there was a zeroth one later as well. Law 0: A robot may not harm humanity or, by inaction, allow humanity to come to harm. So it’s weird that these keep coming up because, okay, so firstly they were made by someone who is writing stories, right? And they’re optimized for story-writing. But they don’t even work in the books, right? If you read the books, they’re all about the ways that these rules go wrong, the various, various negative consequences. The most unrealistic thing, in my opinion, about the way Asimov did his stuff was the way that things go wrong and then get fixed, right? Most of the time, if you have a super-intelligence, that is doing something you don’t want it to do, there’s probably no hero who’s going to save the day with cleverness. Real life doesn’t work that way, generally speaking, right? Because they’re written in English. How do you define these things? How do you define human without having to first take an ethical stand on almost every issue? And if human wasn’t hard enough, you then have to define harm, right? And you’ve got the same problem again. Almost any definitions you give for those words, really solid, unambiguous definitions that don’t rely on human intuition, result in weird quirks of philosophy, resulting in your AI doing something you really don’t want it to do. The thing is, in order to encode that rule, “Don’t allow a human being to come to harm”, in a way that means anything close to what we intuitively understand it to mean, you would have to encode within the words ‘human’ and ‘harm’ the entire field of ethics, right? You have to solve ethics, comprehensively, and then use that to make your definitions. So it doesn’t solve the problem, it pushes the problem back one step into now, well how do we define these terms? When I say the word human, you know what I mean, and that’s not because either of us have a rigorous definition of what a human is. We’ve just sort of learned by general association what a human is, and then the word ‘human’ points to that structure in your brain, but I’m not really transferring the content to you. So, you can’t just say ‘human’ in the utility function of an AI and have it know what that means. You have to specify. You have to come up with a definition. And it turns out that coming up with a definition, a good definition, of something like ‘human’ is extremely difficult, right? It’s a really hard problem of, essentially, moral philosophy. You would think it would be semantics, but it really isn’t because, okay, so we can agree that I’m a human and you’re a human. That’s fine. And that this, for example, is a table, and therefore not a human. You know, the easy stuff, the central examples of the classes are obvious. But, the edge cases, the boundaries of the classes, become really important. The areas in which we’re not sure exactly what counts as a human. So, for example, people who haven’t been born yet, in the abstract, like people who hypothetically could be born ten years in the future, do they count? People who are in a persistent vegetative state don’t have any brain activity. Do they fully count as people? People who have died or unborn fetuses, right? I mean, there’s a huge debate even going on as we speak about whether they count as people. The higher animals, you know, should we include maybe dolphins, chimpanzees, something like that? Do they have weight? And so it it turns out you can’t program in, you can’t make your specification of humans without taking an ethical stance on all of these issues. All kinds of weird, hypothetical edge cases become relevant when you’re talking about a very powerful machine intelligence, which you otherwise wouldn’t think of. So for example, let’s say we say that dead people don’t count as humans. Then you have an AI which will never attempt CPR. This person’s died. They’re gone, forget about it, done, right? Whereas we would say, no, hang on a second, they’re only dead temporarily. We can bring them back, right? Okay, fine, so then we’ll say that people who are dead, if they haven’t been dead for- Well, how long? How long do you have to be dead for? I mean, if you get that wrong and you just say, oh it’s fine, do try to bring people back once they’re dead, then you may end up with a machine that’s desperately trying to revive everyone who’s ever died in all of history, because there are people who count who have moral weight. Do we want that? I don’t know, maybe. But you’ve got to decide, right? And that’s inherent in your definition of human. You have to take a stance on all kinds of moral issues that we don’t actually know with confidence what the answer is, just to program the thing in. And then it gets even harder than that, because there are edge cases which don’t exist right now. Like, talking about living people, dead people, unborn people, that kind of thing. Fine, animals. But there are all kinds of hypothetical things which could exist which may or may not count as human. For example, emulated or simulated brains, right? If you have a very accurate scan of someone’s brain and you run that simulation, is that a person? Does that count? And whichever way you slice that, you get interesting outcomes. So, if that counts as a person, then your machine might be motivated to bring out a situation in which there are no physical humans because physical humans are very difficult to provide for. Whereas simulated humans, you can simulate their inputs and have a much nicer environment for everyone. Is that what we want? I don’t know. Is it, maybe? I don’t know. I don’t think anybody does. But the point is, you’re trying to write an AI here, right? You’re an AI developer. You didn’t sign up for this. We’d like to thank Audible.com for sponsoring this episode of Computerphile. And if you like books, check out Audible.com’s huge range of audiobooks. And if you go to Audible.com/computerphile, there’s a chance to download one for free. Callum Chase has written a book called Pandora’s Brain, which is a thriller centered around artificial general intelligence, and if you like that story, then there’s a supporting nonfiction book called Surviving AI which is also worth checking out. So thanks to Audible for sponsoring this episode of Computerphile. Remember, audible.com/computerphile. Download a book for free.