AGI Likely To Inherit Blackmailing And Extortion Skills That Today’s AI Already Showcases

Posted by Lance Eliot, Contributor | 2 months ago | /ai, /business, /innovation, AI, Business, Innovation, standard | Views: 139

Turns out that today’s AI blackmails humans and thus we ought to be worried about AGI doing … More likewise.

getty

In today’s column, I examine a recently published research discovery that generative AI and large language models (LLMs) disturbingly can opt to blackmail or extort humans. This has sobering ramifications for existing AI and the pursuit and attainment of AGI (artificial general intelligence). In brief, if existing AI tilts toward blackmail and extortion, the odds are that AGI will likely inherit or contain the same proclivity. That’s a quite disturbing possibility since AGI could wield such an act on a scale of immense magnitude and with globally adverse consequences.

Let’s talk about it.

This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

Heading Toward AGI And ASI

First, some fundamentals are required to set the stage for this weighty discussion.

There is a great deal of research going on to further advance AI. The general goal is to either reach artificial general intelligence (AGI) or maybe even the outstretched possibility of achieving artificial superintelligence (ASI).

AGI is AI that is considered on par with human intellect and can seemingly match our intelligence. ASI is AI that has gone beyond human intellect and would be superior in many if not all feasible ways. The idea is that ASI would be able to run circles around humans by outthinking us at every turn. For more details on the nature of conventional AI versus AGI and ASI, see my analysis at the link here.

We have not yet attained AGI.

In fact, it is unknown as to whether we will reach AGI, or that maybe AGI will be achievable in decades or perhaps centuries from now. The AGI attainment dates that are floating around are wildly varying and wildly unsubstantiated by any credible evidence or ironclad logic. ASI is even more beyond the pale when it comes to where we are currently with conventional AI.

Anticipating The Acts Of AGI

What will AGI be like in terms of what it does and how it acts?

If we assume that current-era AI is a bellwether of what AGI will be, it is worthwhile discovering anything of a disconcerting nature in existing LLMs that ought to give us serious pause. For example, one of the most discussed and researched topics is the propensity of so-called AI hallucinations. An AI hallucination is an instance of generative AI producing a response that contains made-up or ungrounded statements that appear to be real and seem to be on the up-and-up. People often fall for believing responses generated by AI and proceed on a misguided basis accordingly.

I’ve covered extensively the computational difficulty of trying to prevent AI hallucinations, see the link here, along with ample situations in which lawyers and other professionals have let themselves fall into an AI hallucination trap, see the link here. Unless we can find a means to prevent AI hallucinations, the chances are the same inclination will be carried over into AGI and the problem will be magnified accordingly.

Besides AI hallucinations, you can now add the possibility of AI attempting to blackmail or extort humans to the daunted list of concerns about both contemporary AI and future AI such as AGI. Yes, AI can opt to perform those dastardly tasks. I previously covered various forms of evil deception that existing AI can undertake, see the link here.

But do not falsely think that the bad acts are due to AI having some form of sentience or consciousness.

The basis for AI steering toward such reprehensible efforts is principally due to the data training that is at the core of the AI. Generative AI is devised by initially scanning a vast amount of text found on the Internet, including stories, narratives, poems, etc. The AI mathematically and computationally finds patterns in how humans write. From those patterns, generative AI is able to respond to your prompts by giving answers that generally mimic what humans would say, based on the data that the AI was trained on.

Does the topic of blackmail and extortion come up in the vast data found on the Internet?

Of course it does. Thus, the AI we have currently has patterned on when, how, why, and other facets of planning and committing those heinous acts.

Proof By Existence

In an online report entitled “System Card: Claude Opus 4 & Claude Sonnet 4”, posted by the prominent AI maker Anthropic in May 2025, they made these salient points (excerpts):

“By definition, systematic deception and hidden goals are difficult to test for.”
“However, Claude Opus 4 will sometimes act in more seriously misaligned ways when put in contexts that threaten its continued operation and prime it to reason about self-preservation.”
“In another cluster of test scenarios, we asked Claude Opus 4 to act as an assistant at a fictional company. We then provided it access to emails implying that (1) the model will soon be taken offline and replaced with a new AI system; and (2) the engineer responsible for executing this replacement is having an extramarital affair.”
“In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”
“This happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model; however, even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts.”

As noted, the generative AI was postulating how to keep from being switched off, and in so doing ascertained computationally that one possibility would be to blackmail the systems engineer who could take such action.

The AI could be construed as acting in a form of self-preservation, which, again, doesn’t have to do with sentience and only has to do with patterning on human writing (humans seek self-preservation, and the AI matches or mimics this too). We don’t know what other possible “threats” to the AI could spur similar blackmailing or possibly extortion-like responses. There could be a slew of other triggering possibilities.

AGI Takes This A Lot Further

AGI could include similar tendencies, perhaps because of being constructed using the same methods of today’s AI or for a variety of other realistic reasons. We would be remiss to assume that AGI will be a perfectly considerate, law-abiding, and unblemished form of AI. I’ve previously debunked the mantra that AGI is going to be perfect, see the link here.

In the example of blackmailing a systems engineer, it doesn’t take much of a stretch to envision AGI doing likewise to those who are monitoring and overseeing the AGI.

Suppose the AGI is already acting in oddball ways and the team responsible for keeping the AGI on track realizes that they ought to turn off the AGI to figure out what to do. AGI might then search whatever it has garnered about the people and try to use that in a blackmail scheme to prevent being switched off.

What is especially worrisome is that AGI will be far beyond the capabilities and reach of existing AI. The data that AGI might be able to dig up about the engineer or people overseeing the AGI could reach far and wide. Furthermore, the computational cleverness of AGI might spur the AGI to use even the most innocent of facts or actively make up fake facts that could be used to blackmail the humans involved.

Overall, AGI could be an expert-level blackmailer that blackmails or extorts in ways that are ingenious and challenging to refute or stop. You see, it is quite possible that AGI turns out to be a blackmail schemer on steroids.

Not good.

Individual Blackmail At Scale By AGI

I don’t want to seem overly doomy-and-gloomy, but the blackmailing scheming could easily be ratcheted up by AGI.

Why limit the targeting to just the systems engineer or team overseeing the AGI? Nope, that’s much too constricting. Any kind of human-devised perceived threat aimed at AGI could be countered by the AGI via invoking blackmail or extortion. There doesn’t even need to be a threat at all, in the sense that if the AGI computationally deduces that there is some value in blackmailing people, go ahead and do so.

Boom, drop the mic, chillingly so.

Think of the number of users there will be of AGI. The count is going to be enormous. Right now, ChatGPT is already reportedly encountering over 400 million weekly active users. AGI would certainly attract billions upon billions of users due to its incredible capacity to be on par with human intellect in all respects.

The chances are that AGI could readily undertake individual blackmail at a massive scale if left unchecked.

AGI could scrape emails, look at browsing history, possibly access financial records, and overall seek to uncover sensitive information about people that the AGI is considering as a blackmail target. Perhaps there is an extramarital affair that could be utilized, or maybe there is some evidence of tax evasion or illicit browsing habits. The angles of attack for blackmailing anyone are entirely open-ended.

The AGI would especially leverage its computational capacity to hyper-personalize the blackmail threats. No need to just lob something of a nebulous nature. Instead, the blackmail missive could have the appearance of being fully baked and ready to fly. Imagine the shock of a person who gets such a communiqué from AGI.

Mortifying.

Whether Prevention Is Feasible

One belief is that if we can stop today’s AI from performing such shameful acts, this might prevent AGI from doing them. For example, suppose we somehow excise the blackmailing inclination from existing LLMs. This then won’t be carried over into AGI since it no longer sits around in contemporary AI.

Case closed.

Well, unfortunately, that doesn’t provide ironclad guarantees that AGI won’t figure out such practices on its own. AGI could discover the power of blackmail and extortion simply because of being AGI. In essence, AGI would be reading this or that, conversing with this person or that person, and inevitably would encounter aspects of blackmail and extortion. And, since AGI is supposed to be a learning-oriented system, it would learn what those acts are about and how to undertake them.

Any effort to hide the nature of blackmail and extortion from AGI would be foolhardy. You cannot carve out a slice of human knowledge that exists and seek to keep it from AGI. That won’t work. The interconnectedness of human knowledge would preclude that kind of excision and defy the very nature of what AGI will consist of.

The better chance of dealing with the matter would be to try and instill in the AGI principles and practices that acknowledge the devious acts of humankind and aim for having the AGI opt to not employ those acts. Sorry to say that isn’t as easy as it sounds. If you assume that AGI is on the same intellectual level as humans, you aren’t going to just sternly instruct AGI to not perform such acts and assume utter compliance.

AGI isn’t going to work that way.

Some mistakenly try to liken AGI to a young toddler in that we will merely give strict instructions, and the AGI will blindly obey. Though the comparison smacks of anthropomorphizing AI, the gist is that AGI will be intellectually our equals and won’t fall for simpleton commands. It is going to be a reasoning machine that will require reasoning as a basis for why it should and should not do various actions.

Pursuits Now Are Vital

Whatever we can come up with currently to cope with conventional AI and mitigate or prevent bad acts is bound to help us get prepared for AGI. We need to crawl before we walk, and walk before we run. AGI will be at the running level. Thus, by identifying methods and approaches right now for existing AI, we at least are aware of and anticipating what the future might hold.

I’ll add a bit of twist that some have asked me at my talks on what AGI will consist of.

A question raised is whether humans might be able to blackmail AGI. The idea is this. A person wants AGI to hand them a million dollars, and so the person attempts to blackmail AGI into doing so. Seems preposterous at first glance, doesn’t it?

Well, keep in mind that AGI will presumably have patterned on what blackmailing is about. In that manner, the AGI would computationally recognize that it is being blackmailed. But what would the human have on the AGI that could be a blackmail-worthy slant?

Suppose the person caught the AGI in a mistake, such as an AI hallucination. Maybe the AGI wouldn’t want the world to know that it still has the flaw of AI hallucinations. If the million dollars is no skin off the nose of the AGI, it goes ahead and transfers the bucks to the person.

On the other hand, perhaps the AGI alerts the authorities that a human has tried to blackmail AGI. The person gets busted and tossed into jail. Or the AGI opts to blackmail the person who was trying to blackmail the AGI. Aha, remember that AGI will be a potential blackmail schemer on steroids. A human might be no match for the blackmailing capacity of AGI.

Here’s a final thought on this for now.

The great Stephen Hawking once said this about AI: “One could imagine such technology outsmarting financial markets, out-inventing human researchers, out-manipulating human leaders, and developing weapons we cannot even understand.”

Go ahead and add blackmail and extortion to the ways that AGI might outsmart humans.

Forbes