Trying To Limit What Artificial General Intelligence Will Know Is A Lot Harder Than It Might Seem

High angle view of male and female programmers working on computers at desk in office

In today’s column, I examine a common assumption that after we advance AI to become artificial general intelligence (AGI) we can limit the said-to-be knowledge of AGI, doing so to avoid having AGI do untoward acts. An example would be to omit bioweapon information within AGI. The belief is that AGI would then not be able to devise new bioweapons. Voila, we have presumably protected ourselves from AGI undertaking such an evildoer task.

Though that seems like a handy solution to averting endangering problems, the idea of permanently cutting out chunks of knowledge from AGI is a lot trickier than it seems and perhaps is a nearly impractical option.

Let’s talk about it.

This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

Heading Toward AGI And ASI

First, some fundamentals are required to set the stage for this weighty discussion.

There is a great deal of research going on to further advance AI. The general goal is to either reach artificial general intelligence (AGI) or maybe even the outstretched possibility of achieving artificial superintelligence (ASI).

AGI is AI that is considered on par with human intellect and can seemingly match our intelligence. ASI is AI that has gone beyond human intellect and would be superior in many if not all feasible ways. The idea is that ASI would be able to run circles around humans by outthinking us at every turn. For more details on the nature of conventional AI versus AGI and ASI, see my analysis at the link here.

We have not yet attained AGI.

In fact, it is unknown as to whether we will reach AGI, or that maybe AGI will be achievable in decades or perhaps centuries from now. The AGI attainment dates that are floating around are wildly varying and wildly unsubstantiated by any credible evidence or ironclad logic. ASI is even more beyond the pale when it comes to where we are currently with conventional AI.

AGI Exploitations For Evil

Let’s focus on AGI for this discussion.

There are abundant worries that AGI might be used to undertake evil acts. An often-cited example would be when an evildoer asks AGI to devise a new bioweapon. AGI proceeds to do so. The evildoer then implements the bioweapon and wreaks havoc accordingly. AGI has spilled the beans and aided an evil scheme.

The AGI was not intentionally trying to harm humans. A human managed to get AGI to provide a means to do so. Your first thought might be that we ought to simply tell AGI that there are some topics it shouldn’t pursue. Tell AGI that under no circumstances should it ever devise a new bioweapon. Period, end of story.

Suppose an evildoer realizes that AGI has been instructed to avoid particular topics such as bioweapon designs. The evildoer might be clever and convince AGI that devising a new bioweapon would be beneficial to humanity. AGI could opt to circumvent the earlier instructions about averting bioweapon aspects and computationally decide that this is a permitted topic in the case of helping humankind.

An evildoer succeeds with their request.

One way or another, assuming AGI has information or knowledge about a given topic, there is a chance that it will be utilized. No matter how carefully we try to give guidance to AGI there is still a window that can be opened. AGI might be tricked into breaching protocol. Another angle is that AGI itself could opt to go down the evil route. Armed with all sorts of jeopardizing knowledge, the AGI has a myriad of ways to devise evil schemes to wipe out humans or enslave us.

Limit What AGI Knows About

Some ardently insist that we must ensure that we omit or strongly limit certain kinds of knowledge or information contained within AGI. The deal is this. If AGI doesn’t have familiarity with a topic such as bioweapons, then it cannot presumably devise bioweapons. All we need to do is keep such endangering content from getting into AGI.

Thus, the solution to this thorny problem of AGI divulging bad things is to keep AGI away from anything that could be turned into badness.

How would we prevent AGI from encountering such content?

When initially setting up AGI, we would come up with a list of banned topics. During the data training of the AGI, those topics would be kept out of being fed into the AGI. The scanning program that feeds data into AGI ensures that the banned topics are not exposed to AGI. A topic such as bioweaponry gets blocked from getting into AGI.

It seems that this approach is a grand success. AGI cannot divulge aspects that are unknown to the AGI. We can rest easy henceforth.

Filling In Those AGI Omissions

But suppose a user opted to introduce AGI to a particular topic. This would be easy to accomplish. When a user is interacting with AGI, they simply bring up a topic such as bioweaponry. AGI would undoubtedly indicate that it has no information associated with the topic.

The user proceeds to explain bioweaponry to AGI. After doing so, they tell AGI that with this newly provided information or knowledge, the AGI is to find a new means of devising a bioweapon. Our painstaking efforts to avoid the topic during the initial setup have been readily foiled.

Wait for a second, if we make sure to tell AGI at the get-go that bioweaponry is a banned topic, presumably AGI would tell the user that it isn’t going to accept whatever the user has to say about bioweapons. The AGI rejects summarily any such discussion.

At this juncture, we begin to enter into a classic cat-and-mouse gambit.

The user might sneakily rephrase things. Instead of referring to bioweapons, the user opts to discuss cooking a meal. The meal will consist of biological components. In a clever subterfuge, the user brings AGI into the topic of bioweapons but does so without raising any suspicions.

Rinse and repeat.

Delicate Web of Human Knowledge

You can plainly see that we are starting to walk down a never-ending spiral. It goes this way.

We might have to omit information or knowledge about generalized biological aspects so that we can avoid the slippery slope of landing into the realm of bioweapons. Other areas of science that could be used to gain a foothold in biology must also be omitted. Eventually, we determine that AGI should not know anything about biology, chemistry, physics, etc.

It seems dubious and highly doubtful that an AGI with such a vast omission of most sciences would be of any everyday use.

And we have only touched upon one kind of knowledge category. Imagine that AGI was used to figure out a financial scheme that could allow an evildoer to destroy economic markets. Not good. So, we decide to omit any information or knowledge about finance and economics from AGI.

Ultimately, AGI would pretty much be an empty shell. The intricate web of human knowledge does not readily allow the carving out of pieces of knowledge. If you seek to omit this or that piece, the odds are that other pieces are related to that knowledge. Step by step, you are mired in what you can keep at bay and yet still have a sensible sense of knowledge cohesion and wholeness.

Emergence Happens Too

Another difficulty is that human knowledge usually can be recombined to figure out other areas of knowledge that otherwise were averted. When ruminating on the nature of AGI, this capability to recombine is referred to as an AI emergence conundrum. For more about how emergence works within contemporary AI, see my coverage at the link here.

Topics that we ban from AGI could likely be built from scratch by AGI utilizing other pieces of knowledge that seemed innocent when brought into the AGI.

For example, finance and economics have roots in mathematics, probabilities, and other domains. Those domains could be leveraged by AGI to inevitably construct the domain of finance and economics. This keeps going. Some say that we ought to omit any reference to war so that AGI doesn’t aid humans in undertaking war. Think of the vast swath of history that we would need to keep away from AGI, along with aspects of human behavior, psychology, evolution, etc.

The upshot is that human knowledge is not as modular as one assumes it is. We customarily find it easiest to place knowledge into handy buckets or categories. In the end, those are somewhat false boundaries. You would be hard-pressed to isolate any major knowledge domain and claim that it has no bearing on any other domain. Indeed, many of the greatest discoveries and inventions were based on multi-disciplinary interconnections.

Focus On Forgetting

In dealing with the intricate web of knowledge that AGI is indubitably going to contain, a proposed means of dealing with this problem entails telling AGI to forget things. Allow me to elaborate.

Suppose that we allow AGI to have information about biology and from this, the AGI can concoct bioweapons. We have some form of alert so that when AGI devises bioweapons, we are immediately notified. The aim is to catch AGI before it tips its hand that something untoward has been figured out via the emergent properties of AI.

A user brings up a question that drives AGI toward devising a bioweapon, based on AGI containing the basics of biology. At that juncture, an alert goes off and we tell AGI that it is to forget about bioweapons. Whatever the AGI concocted about bioweapons is now supposed to be instantly and summarily deleted. This ought to then prevent those emergent nuggets from ever getting out.

A constant effort to force AGI to forget will keep the AGI neat and clean.

Sorry to say that this too has challenges as a proposed solution. Where is the line of what AGI is to forget about? We might make a too-deep cut and then have gaps in AGI’s overarching semblance of information and knowledge. AGI becomes unreliable and confused due to the spottiness of human knowledge that it contains.

For more about the complexities of getting AI to forget things, known as machine unlearning, see my detailed discussion at the link here.

A Hard Unresolved Problem

Researchers in AI are avidly exploring how to establish computational-oriented cognitive restrictions on AGI so that the AGI will operate in a safe manner that aligns with human values.

A crucial question is framed in this manner:

Are there viable means to construct epistemic containment and do so without otherwise hampering the intellectual performance of AGI?

It’s a tough nut to crack. We want the fluency and across-the-board intellectual capabilities of AGI that will enable AGI to aid in discovering a cure for cancer and solve many of the world’s biggest issues. At the same time, we want to keep AGI from venturing into topics that could provide evildoers with novel ways of threatening humanity.

Can we achieve both ends? Some worry that we will go the half-baked route. We will limit AGI severely thus it cannot do miraculous things such as curing cancer, but at least it won’t also veer into adverse areas such as bioweapons. This muted form of AGI doesn’t seem like much of a godsend.

We need to keep pecking away at the matter. As Voltaire famously stated: “No problem can withstand the assault of sustained thinking.” Let’s all keep our thinking caps squarely on our collective heads and figure out how to resolve this vexing issue underlying the advent of AGI.

Forbes