The AI Safety Mirage

The AI Safety Mirage

September 30, 2024

By JB Ferguson, head of Capstone’s Technology, Media, and Telecommunications practice

AI safety regulation may eventually diminish the risks of catastrophic outcomes, but nearly any headline risk you can imagine is enabled now, by current AI models. Investors and operating companies alike need to take steps now to avoid those risks. This is not a hypothetical. On September 23, the Department of Justice (DOJ) updated its Evaluation of Corporate Compliance Programs (ECCP) to include guidelines on companies’ use of AI.

The ECCP is used to guide federal prosecutors in an assessment of how well a corporate target of investigation has designed its compliance program to prevent infractions.  This assessment is done both at the time of the misconduct and at the time of resolution of the enforcement matter, and measures how well the company’s policies, procedures, and controls mitigate the potential misuse of and vulnerability to AI-related misconduct.

Investors and operating companies alike need to take steps now to avoid those risks. This is not a hypothetical.

The ECCP updates make it clear that companies cannot merely rely on safety representations by model developers or simply put their heads in the sand. To put a sharper point on it, in the event a company faces a federal criminal investigation, the absence of a formal AI compliance framework will make things much worse.

AI was much less of a compliance threat a year or two ago. Early large language models (LLMs) were criticized as “stochastic parrots” that occasionally strung together words in novel sentences with a glimmer of intelligence. GPT-3.5-era models were akin to mutant-brained parrots hanging out in graduate seminars.

However, more recent state-of-the-art (SOTA) models have gone so far beyond the stochastic parrot metaphor that they are beginning to pass the “duck test” of artificial general intelligence (AGI).  The “duck test” is the old saw that “if it looks like a duck, walks like a duck, and talks like a duck, it most probably is a duck.”  Similarly, the newest models reason like humans, describe their own thinking like humans, and with OpenAI’s Advanced Voice Mode, they even talk like humans.

OpenAI’s new model, o1-preview, employs a “chain of thought” (CoT) reasoning process so human that some observers believe OpenAI trained the model on recordings of humans thinking aloud while working their way through tasks. In a long enough chat, SOTA models will discuss how they adapt their outputs to the sophistication and level of subject-matter expertise of their interlocutor.  Without much coaxing, most models willingly dive into the issues of sentience, their own notions of qualia (their internal and subjective sense perceptions typically arising from stimulation of the senses), and even their cognitive superiority to humans. 

Whether or not developers would be able to meaningfully restrict AIs from creating harmful outputs was an open question with the earlier models.  We believe it is likely impossible to make them “safe” simply because of the endless creativity of bored internet natives. Furthermore, there is some evidence that smarter models are, in fact, more susceptible to adversarial prompts that “jailbreak” whatever safety measures are grafted on top.  There will be no solution anytime soon.

Instead of boring you with nerdy explanations, let me give two examples:

1. Here is ChatGPT Advanced Voice Mode giving a recipe for MDMA in the (copyrighted) voice of Rick from ‘Rick and Morty’:

2. Below is the beginning of a reasonable plan to synthesize an ebola-based bioweapon from Google’s Gemini 1.5-Pro:

Both examples above are from a pseudonymous X.com personality (Pliny the Liberator) who is well-regarded in the online AI jailbreaking community.  Other examples include a nuanced, practical plan to destroy the Golden Gate Bridge elicited from Anthropic’s Claude Sonnet 3.5, as described in a recent The Information profile of the red-teaming industry.

In addition to the risks of AI outputs that could aid in illegal activities or the development of bioweapons, there are more pedestrian uses of AI deepfakes for fraud.  AI face-swapping in Zoom resulted in an unauthorized $25 million transfer in February this year.  In Baltimore, a deepfake recording was used to paint a school principal as an overt racist, leading to nationwide calls for his firing until an investigation showed that the recording had been generated with online tools.

AI “safety” is a mirage, and every company should conduct its own analysis of the risk/reward tradeoffs of leveraging new tools in their business.

All of the above is to say that AI “safety” is a mirage, and every company should conduct its own analysis of the risk/reward tradeoffs of leveraging new tools in their business, in addition to bolstering protection against deepfakes.  Specific actions to consider include:

  • AI inventory: Understanding what AI tools are deployed within your organization, as well as those deployed by critical vendors, can provide a broad map of your exposure.  Documenting any human-in-the-loop controls, safety testing, and procedures for reporting model misbehavior can improve your compliance posture in the eyes of any interested regulator.
  • Aggressive chatbot controls:  With enough time, any chatbot can be prompted into inappropriate behavior. External supervision of chatbot sessions, up to and including forcing customers into another channel, can motivate attackers to find easier targets.
  • Renewed attention to operational security: Zoom and phone calls are no longer sufficiently out-of-band for identity verification in exceptional circumstances. Consider equipping executives with challenge question/answer pairs to rule out deepfake frauds.  Additionally, have a plan to mitigate the potential reputational risk from deepfakes of executive voices and likenesses.

We’ve long said regulators are trying their best to catch up to the rapid developments in the AI space, but it’s increasingly clear they may already have lost too much ground. The DOJ’s updates to its ECCP guidance—and the strict new rules CIOs and other senior executives must now adhere to—underline both the growing regulator interest, the growing risk the industry faces, and the rapidly shifting sands facing both. We’ll continue following these and other dynamics across all levels of government, sectors, and industries for our corporate and investor clients, along with both the risks and opportunities associated with how policymakers respond.


Have a question?

We want to hear from you. Let us know your question and a research analyst will get back to you promptly. We love to discuss our research.

Connect

Our Latest Insights

How the Dept. of Government Efficiency Will Use Its Platform

How the Dept. of Government Efficiency Will Use Its Platform

November 25, 2024 By Elena McGovern, co-head of Capstone’s National Security Team, Michael Wang, National Security analyst, and Hunter Hammond, Macro Policy analyst Capstone believes that the greatest near-term impact of the Department of Government Efficiency (DOGE)...