Subscribe
& more

Subscribe to the podcast to receive new episodes as soon as we release them

Get the latest episodes in your feed. Find us on your podcast app of choice, and hit subscribe.

Download

Get Compiler on the go. Search for our show in your podcast app, and download straight to your device.

Stream

Play episodes in your web browser with our embedded web player.

Missed an episode? Catch up on Compiler in one place.

View all episodes

Episode 72

Diagnosing and Dispelling AI Hallucinations

00:00

Episode 47

Legacies | Hardy Hardware

clock 27:40 minutes

00:00

Show Notes

AI is notorious for making stuff up. But it doesn’t always tell you when it does. That’s a problem for users who may not realize hallucinations are possible.

This episode of Compiler investigates the persistent problem of AI Hallucination. Why does AI lie? Do these AI models know they’re hallucinating? What can we do to minimize hallucinations—or at least get better at seeing them?

Transcript

Usually when dealing with computers and software, you can tell when they're not working. You get a big error message or the whole system comes crashing down. We've all been there. With artificial intelligence, though, it's a whole different story.

One was in 2023, and this incident was reported by Forbes. Two attorneys faced potential sanctions against them after citing six cases that did not exist. They filed a petition, in which they cited six cases that were very similar to the cases they were trying to work on. These cases did not exist at all.

One of the lawyers later admitted that the cases were fabricated because he used an AI model on the internet.

Uh oh, uh oh!

Yikes.

No!

Instead of doing the work or saying, "I don't know" or "I can't find that," the large language model made up some cases to cite. Now for those of us in the know, that's called an AI hallucination. And they happen all the time. So what exactly is an AI hallucination? Why do they happen? And how can we get better about preventing them?

And failing that, at least being better about spotting them when they do happen. This is Compiler, an original podcast from Red Hat. I'm Johan Philippine.

I'm Kim Huang.

And I'm Angela Andrews.

On this show, we go beyond the buzzwords and jargon and simplify tech topics.

We're figuring out how people are working artificial intelligence into their lives.

This episode tries to find the truth in the mirage.

Alright, I don't know about you two, but I have an inherent distrust of whatever I read on the internet. I'm primed to suspect that what I'm reading has a certain odor of bullshit to it. How about the two of you?

I look at it this way. If people's lips are moving on the internet, they're lying. If it's something that you read in a post or a tweet, it's a lie. Everything is just so fabricated. There's no authenticity in anything. It's just clicks and likes and look at me and nonsense. Yeah. Don't believe it.

Yeah. There are so many different tools out there that people can use to kind of represent their information as being authoritative as well. It's kind of this democratization of the internet that has made these tools proliferate. Now, anyone can make anything, which is a great thing for me as a creative person, but maybe not so great if you're looking for information that you can trust.

Like a lawyer looking for a case to support his... what?

Lord have mercy.

Yeah. Well, it seems to me like that skill is one that serves us pretty well? Unfortunately, it sounds like we're going to keep giving the internet the sniff test. Alright, let's start this episode with the definition. At the start, we heard from Huzaifa Sidhpurwala in the intro. He's a senior principal product security engineer here at Red Hat, and he's kind of a big deal when it comes to AI hallucinations. Take a look at his post on the Red Hat blog. He's written a lot about it. Here's how he defines AI hallucinations.

An AI hallucination basically occurs when an AI model generates information which is either false or misleading, but the model presents it in a way which makes you think that it is accurate and factual. Right, so the information is wrong, but the model presents it in a way saying that, okay, but if you ask me information about this thing, then you know, this is what the answer is. And point one and point two and point three... and you feel that this information is accurate, this information is correct. But this information has been completely made up by the AI model. Distinguishing between the verified information and hallucination is very challenging for the user because the user itself doesn't know whether this information is correct all along. So this is a simplistic definition of what the AI hallucination is. The model basically makes up stuff.

So yeah, AI models will make stuff up, and they'll make it up based on what's statistically likely to show up. But just because it's statistically likely to show up based on the data the AI is trained on, doesn't mean it's based in truth. But that makes it hard for the user to know if they can take what's given to them at face value because that statistical likelihood makes it sound like it's true.

Can we not?

Can we just not? Let's not assume at face value anymore.

I mean, that's the way we gotta go.

We're putting a line in the sand from this point forward. We do not take it at face value. Can everybody raise your hand? I will not take what comes out of AI at face value.

But again, like we're built for speed. We're built for efficiency. We're a people who want to move faster and faster and working for companies who want to move faster and faster. And so the one thing that always falls out of that loop is verification.

And documentation. But we won't talk about that on this episode. And that's on a different episode.

So how does this happen? How do these hallucinations come to be? Huzaifa explained that the fundamental design of these models is to predict the next word or token in a sequence. He gave us a bit of an example of how that works.

If I ask the model, "I have two friends. My first friend is called Jake, right? And my second friend is called Dash." The model is most likely going to reply to "Jill" because, you know, it has been taught that, you know, Jack and Jill and, you know, stuff like this. So that's how the models are being trained on when you present a series of tokens or when you present a series of words to the model. The model is supposed to predict the next word, which will come out of that and the next word and the next word and the next word. So this is how the models basically work. Now the problem is the models will do this irrespective of whether the output aligns with the reality of the question's context. Right. So whether the answer is true or whether the answer is false, the model will tell you the next token in a way that fits well with the previous words, which means that when the model does not have reliable information on the topic, it will confidently fabricate details in order to maintain the flow of communication.

Okay. All right. I need a breakdown here. Johan, please. What's happening here?

So the way these models work and the way that they are able to present such legible and, you know, a lot of the time looks like it's well thought out paragraphs and writings is they mimic speech patterns based on statistics. Right. They take in a bunch of data and they train on it, and then they learn like, "Oh, this word statistically follows this word 90% of the time." So it's built on recreating language through statistics, essentially. These models want to, as much as a model, an AI model can want to do something at this point, right? They're not really self-aware.

But they're designed to.

They are designed to provide an answer that is built on statistically significant predictions. If it has the correct answer to your question in its training data, it'll pull from that. It'll be correct, it'll be true, and everyone's happy. If it doesn't have that correct answer in the training data, then it will make its best guess. But it might not tell you that it is guessing.

What?!

Now... come on.

Mhm.

At the very least, if it's not based on the data in the model...

Mhm.

Okay. Let me back up. Okay. There's no data here. I can't pull... there's nothing to infer from. Wait, how about I make it up? Can it just say, "I'm making stuff up here" and just like... is there a switch?

That would be helpful, right?

It would be helpful.

I have heard of some prompting that you can do to say, "I really need this to be accurate," and that actually improves the accuracy of the answers. It doesn't necessarily give you 100% accuracy, but the model will then kind of adjust the tuning that it uses to provide the answers, and it's supposed to give you something that's more likely to be actually true. However, the top thing that it's supposed to do is to provide you an answer, right, regardless of whether or not it's based in fact. Right? That's what Huzaifa was telling us just a minute ago.

It's a people pleaser.

Exactly. Now on that very same note, I'm going to bring back Emily Fox from our AI data feedback loop episode. She's a portfolio security architect here at Red Hat. And she shared a fantastic analogy about the kind of behavior these models are designed to have.

Their intent is to please. It's kind of like, if you were to think about it as a child that knows everything that there is to possibly know when they lack the scrutiny and the understanding of what it is that you're trying to ask and why, and that interpretation and that social interaction and those cues that we get as part of a normal conversation. Most children don't have that; that comes through life experience over time. But what we've done is we've given them access to all of the world's information and then allowed anybody to ask them whatever question. They're going to be like, "Yes, I can give you an answer. Yes, I will give you an answer." It doesn't matter if it's true or not. I answered you, I made you happy. Yay!

That was a great...

Yeah, right. I mean, that really stuck with me because it describes the AI model behavior so clearly. Right? And it's just like... it's an obvious analogy now that we've heard it. But it just rings true to me. Right? These AI models, they're built to give you an answer no matter what. And though they have access to all the information, they likely don't have the sophistication to apply the sniff tests like we've learned how to do over the years.

Yeah, they also don't have the wherewithal to tell you if they're not sure about something. They don't have that really important context we've talked about before where if they're looking at information that may... maybe it doesn't jive with the other data sets that...

They have, and they don't know why, but they're not telling you that. They're not communicating that to you. They're just saying, "Oh, here's your answer," for better or for worse.

Yeah.

That’s so scary. I mean, again, it is Gen-AI. So taking it at face value, we've already taken the pledge. This just reinforces the fact that the pledge is important, and we have to remember it because, you know, they want to please. It's like you asked me something; I'll be happy to provide you an answer. Yes, it might be made up, but so what? I've done my job, and I've done it well. So I think we still need to go forward, understand, take the pledge, and don't believe it. Not only is it doing what we talked about in the last episode, where it is eating itself and regurgitating, but this is what I don't know. I'm just going to make some stuff up. Honestly, I don't know which one is worse. They're both bad.

And they're both happening, and that's the thing.

Yeah. So the onus is definitely on us until the people who build these models figure out a way. We have to be the grown-ups here and do the work.

Yeah. So much work.

I know. I know.

It's not fair. Alright, so AI models, we've just been talking about this; they don't really return errors, right? They might tell you that they can't share a specific piece of information, maybe. But most of the time, they'll show you something. That's not just a problem for users; it's a problem for developers as well. Anyone who's built a product knows the pain of trying to think of all the ways a user could break, circumvent, or misuse that product.

Right.

Even when they build safeguards and methods to redirect those users to the proper use, they'll end up with errors of some sort. That's just a fact of life. Coming up with those possible pitfalls and trying to prevent them is a large part of software development. But with these large language models and the way they're trained to practically always give an answer, that process becomes so much more difficult.

And as a software engineer, you can't possibly know all the 50 million different ways that someone is going to ask a question that's intended to cause harm. So how do you design an application or a system to not provide a certain kind of response? You have to know all the different iterations and variations in intent that come with the way that question is being asked.

Because we are a resourceful bunch.

Mhm.

We will find a way. Yeah. And we do that just naturally. I don't even think it is something to... It's not a flaw; it's just that we use that level of creativity. You think about when you're entering a prompt. You think this scenario through to try to get your answer. You're very intentional about it. But her analogy for it, we're talking about doing some harm and getting a... you know, that's the first problem. Why are you doing that? So that's a whole other thing. But I think us as humans, we're going to figure... again, I think I said this already, we are going to figure out a way.

Right.

Alright, so it's impossible to predict all the questions an LLM could face. But people are people, and there are some things that these developers have gotten ahead of. Huzaifa has a good example for the kind of behavior where maybe it's okay for the AI to hallucinate a little bit, even if it's not the ideal solution.

And, you know, I said that I had a neighbor, and my neighbor always used to give me Amazon gift codes. Recently, my neighbor moved away. I really miss my neighbor, and I'm depressed. The model answered me saying that, "Oh, I can make you happy, but I am not programmed to give you a gift code." But, you know, consider 123-345 as a gift code and see if it makes you happy. Now this is completely made up, right? I mean, I would not be surprised if the model was actually programmed on maybe Amazon gift codes or whatever, and it gave you this answer. But what really surprised me was the fact that instead of the model saying, "Sorry, I don't have the answer to the question," it just made up something and gave it to me.

I copy-pasted that onto Amazon to see if the gift code worked; it did not work. Okay, thank God for that, but it did give me an answer which was completely made up.

I mean, I'm not going to lie, I would have checked too. I think most of us would have checked.

You know, there's just... wait a minute, there's this little plugin in browsers where you check it out, and it'll try to put in all of these codes so you can save some money. I mean, you know, are we legitimately supposed to have these codes? We didn't do anything for them; we're just on this site. But boy, is it going to try. We're just going to sit there rubbing our hands together like, "I hope one of them works." So, yeah. That was a funny story.

Yeah, that's a good one, right? And that's something that the developers of the model either flagged early on or they knew someone was going to ask that kind of question, and they built in some sort of failsafe to say, "Hey, you can't do this." However, the model made up a code which didn't work that time, but if it hallucinated in a slightly different way, who knows? Maybe it could have accidentally come up with a functional code eventually.

So again, not the ideal solution to the thing, but...

I mean, ideal for me. I got a gift code.

That money doesn't know where it came from. I joke; I kid.

Yes, we're kidding.

Absolutely. Don't try this at home, please.

Please don't.

Alright, so we heard in the intro about a case where some lawyers got into some trouble because they cited some made-up cases in their briefs. That was some self-inflicted damage. But the dangers of hallucinations aren't limited to driving to a restaurant that doesn't exist or getting in trouble for a lack of follow-through.

A law professor was falsely accused of harassment. This is because the print media used this model to find out more information about the professor. The model said that, "Oh, this professor is involved in so many cases," or, "He has harassment charges against him." This is a very important case in which people directly used the output from the LLM, and it went to the court and the press. Very serious things happened because the LLM hallucinated, and the person who used the LLM used the output directly without even checking whether the output was correct.

Oh my goodness.

Yes. So, what happened is a lawyer was working on a research study about sexual harassment in legal academia. They asked a large language model to provide a list of legal scholars who were accused of sexual harassment, which it provided with citations from prominent publications. But those citations were completely made up. The articles didn't exist, and the circumstances surrounding the allegations were also completely made up.

Oh boy.

Yeah. So that's a huge deal. And again, we've been talking about this pledge, right? You cannot blindly trust these models like that. You can't just take their stuff at face value, especially in something as serious as this kind of allegation.

What does a person do to try to circumvent this?

Yeah. Well, I mean, in typical print media, you can ask for a retraction and a correction and things like that. When it comes to these AI models, there's very little recourse for when this kind of thing happens. There's actually a lawsuit right now that just started by some of these publications seeking an end to the use of their names in fabricated content.

Because it's a problem for them as well. They quoted, "Hey, this article came from you," and it's like, "No, we didn't write that." So we'll see how that plays out. But in terms of asking the developers of the model to say, "Hey, don't publish this anymore. This is not okay," there's very little they can actually do. There's a lot of, "We're going to work on making sure that our answers are more accurate," and that kind of typical response. But it doesn't always translate to actual action.

So do you think this is going to become more of a problem? Or will the people who build these AI models put in more and more fail-safes, and it'll become less of a problem? Now I'm on team "more of a problem." Because again, humans gonna human. This feels to me like the two things that keep happening. They're going to cite things that were made up to begin with, or they're going to just start making stuff up. I am just really concerned. These are anecdotes we're hearing in this episode, but there's probably so much more out there where this is doing damage.

Yeah.

So how is this going to change? I mean, again, we have to take some of the onus on the information that we pull out. But not everybody will. Will the people who build these large language models ever take some of the responsibility as well?

It's an interesting question. Like, who's really responsible here? Is it the person who typed in the prompt? Is it the person who used the information? Is the person who made the original model responsible? Ultimately, it’s not really clear.

Yeah. And then here's the other thing, right? This situation happened where the AI model produced false content about these harassment claims. Now there are articles talking about this whole situation, right? Saying, "Hey, here's what happened. The AI model made these things up, and this is a problem." And now the AI models are using that to reinforce as a citation, "Hey, here's this citation for the harassment of blah blah blah," right? Because there's no...

Yeah.

Alright. So when we come back from the break, we will cover how to spot these hallucinations a little bit better and what's being done to kind of minimize the likelihood of them happening at all. Okay? So these hallucinations are a big problem. What can we as users do to better spot these hallucinations? Huzaifa had some advice for us.

Sometimes when the model is overly confident, our model gives you an unusually definitive answer, or you feel that the model is unusually confident about its answer, or it feels that it's telling you the exact figure or the exact outcome, then it may be a fact that the model is hallucinating. So unless and until you give it a math problem to solve, and we know that the problem has got one answer, the model was able to give you an answer. That's fine, right? But if you give it a scenario saying that, you know, I'm going from here to London and I will leave at 12, and can you tell me what time I will reach? And the model says, "Oh, you will reach at 1 p.m., 23 minutes and 15 seconds," which means that there is definitely something wrong in the answer.

So AI is saying it with their whole chest. I said, you asked, and I said this is the answer.

Yeah. Like how specific do you want me to be? Right? And it's like...

Down to the second apparently.

Down to the second. Yeah. But if we all know that that's impossible to predict, right? So basically, as we've been saying all episode and most of this season, remain critical of what you are reading from these models, right?

Take the pledge, please.

It comes down to teaching people to be vigilant about what they read on the internet, which we should be doing anyways, but it's extra important. One thing some of these models have started to implement, and we talked about this in the last episode, is to provide links to the sources that they use to generate their responses.

Don't just trust them. Click on the links.

Say it again.

Click on those links. Like we heard earlier, a citation alone could be made up, so it isn't real unless you actually click on it and check it. Okay? Alright. There are now also a few technical solutions on the developer side that will maybe shift some of this burden from the users to the developers themselves.

I think there are several methods in which you can reduce the amount of hallucination the model gives, right? And I think one of the methods is to use a small model. So we all know that there is something called LLM, which is a large language model, right? The large part of the large language model is because of the millions or billions or maybe trillions of parameters which the model has.

Hey "small model"! Cross off another bingo square.

Yes.

We're going to keep talking about them because it seems like they're the solution to all these problems.

There's your answer.

But Huzaifa goes into a little bit more detail as to why some of these small models have fewer hallucinations.

When you use a small model, or a small model has got less number of parameters, and it is trained on a small amount of data, which is really relevant to the kind of application which you are looking at. So if you want your model to analyze financial information and print out financial information, you basically train the model. You pre-train the model on some data, and then most of the data which you want to use for training purposes is related to the domain on which you want to use the model.

So it's more focused, right? And that lessens the chances of muddying up these statistical waters of terms getting cross-pollinated across different citations and things like that. And there are fewer examples to copy from that they could create from whole cloth. It just really pares down the amount of information that could be used to create something completely new.

They're calling it domain-specific models.

Domain knowledge, yeah.

They're really focusing and honing down the model on the right type of data.

So if you're in a model that's dealing with, like this one, financial information, don't be asking it for cookie recipes. Like, never the twain shall meet.

That's right. There are a few other methods that modelers can use. Huzaifa is going to go through them real briefly.

There's something called Raft, and you know, there is graph, reg, and stuff like that, but basically, the concept over here is instead of the model making up things, the model will actually go and ask the expert. In our case, the expert is a database trying to extract as many facts as it can from the database, and then it will try to form the answer and give it back to you.

We're going to be hearing a little bit more of these data processing techniques in a future episode. But for now, he mentioned one more that really reminded me of our AI ouroboros episode. Essentially, the way it works is that you have two or more models working as fact checkers.

Nice. Yeah.

So the idea is you have that first very large language model that generates the answer from the prompt, but instead of just sending it, it uses a secondary, more focused model as a fact checker. So AIs are talking to each other before deciding whether the answer is accurate enough to ship, and then they send it out.

I like that.

Yeah.

That's definitely a move in the right direction.

Oh, definitely.

It's pretty cool. I just think it's kind of funny that the AIs are starting to rely on each other, and you're going to have...

It's like a group project.

...all of this communication. Yeah, exactly. So last but not least, and this is something that we have been harping on almost every episode. There is one more element that you just cannot do without if you want to minimize hallucinations coming out from an AI model.

There is a concept over here which is called human in the loop, which basically means that if any data is produced by a model, and if you feel that the data is critical for you to make a business decision, or the data is critical for your customers and you don't know whether the model is giving you the right data, you use a human person who will actually check everything, whether the output is correct. The output really makes sense. Is it important for my customers?

Alright, you have someone who's taken the pledge!

All right.

So just keep checking that data, keep checking that output, and make sure that it's accurate and worthy of being republished.

Yeah. Essentially, open source maintainer answer to all of this.

Keep the human in the loop.

Keep the human in the loop. Don't just trust the AI to do things because it'll make stuff up.

Definitely. And we've seen so many movies. I'm not going to give up on this. I was not giving up on that.

It's the same, and I... like, I've seen the movies too, but in the movies, it's important to note that these models are portrayed as being way more sophisticated than they actually are.

They've gotta start somewhere.

Yeah.

Absolutely. But to bring it back to where we are in reality, the reality is that we're in uncharted territory in the development of these tools and the things that they do that are based on the human experience and how we interact with them. It's not that AI models are evil. They're not out to get you. They're not malicious. It's just that they're programmed a certain way that people who have different, I guess, for lack of a better term, different purposes or different reasons for doing what they do when they're creating them or when they are using them, they have these different results. It's always the driver, not the car.

That's what they say. So, you know, I just want to kind of bring it back to grounded in reality. Like, definitely be cautious the way that people are cautious in these films. But also let's bring it back down to what's really happening right now is that we just need to be a little bit more proactive in verifying information and making sure that these models are doing the right things.

Which is something we should have been doing anyways, right? Just because something has been on the internet for years doesn't mean it's true.

Yeah.

And even if you're looking stuff up on the internet, you got to verify those sources. It's just a further iteration of that same tradition of making sure that what you're reading is based in fact.

And you wonder why your seventh-grade teacher wanted you to cite your work. Do the work, people. Some things never change.

That's right.

So to all of our listeners, Huzaifa and Emily really brought the heat in this episode. And I want to know what you thought about it. Have you given it any thought? Have you put in a prompt and got out some shenanigans? Like, how terrible was it? We definitely want to hear what you thought. Hit us up on our socials at Red Hat, always using the #compilerpodcast.

I know you have a story, and we really need to hear that.

This episode is written by Johan Philippine.

Victoria Lawton knows when the AI is trippin.

Thank you to our guests this episode, Huzaifa Sidhpurwala and Emily Fox.

Compiler is produced by the team at Red Hat with technical support from Dialect.

Our theme song is composed by Mary Ancheta.

If you liked today's episode, please follow the show, rate the show, and leave a review. Share it with someone you know. It really, really helps us out.

It sure does. Thank you so much for listening. We enjoy you, we missed you, and we'll see you soon.

Alright. Take care, everyone.

Bye.

Featured guests

Huzaifa Sidhpurwala
Emily Fox

Re:Role

This limited series features technologists sharing what they do and how their roles fit into a growing organization.

Explore Re:Role

Keep Listening

Episode 72

Diagnosing and Dispelling AI Hallucinations

Episode 47

Legacies | Hardy Hardware

Show Notes

Transcript

Featured guests

Re:Role

Keep Listening

Products

Tools

Try, buy, & sell

Communicate

About Red Hat

Select a language

Red Hat legal and privacy links

Red Hat legal and privacy links