Note: This post builds on last week’s article “Emerging Risk and Smart Regulation.”
Recent advances in machine learning like DALL-E 2 and Stable Diffusion show the strengths of artificial narrow intelligence. That means they perform specialized tasks instead of general, wide-ranging ones. Artificial narrow intelligence is often regarded as safer than a hypothetical artificial general intelligence (AGI), which could challenge or surpass human intelligence.
But even within their narrow domains, DALL-E, Stable Diffusion, and similar models are already raising questions like, “What is real art?” And large language models like GPT-3 and CoPilot dangle the promise of intuitive software development sans the detailed syntax knowledge required for traditional programming. Disruption looms large—and imminent.
One of the challenges of risk management is that technology innovation tends to outpace it. It’s less fun to structure a control framework than it is to walk on fresh snow, so breakthroughs happen and then risk management catches up. But with AI, preventive controls are especially important, because AI is so fast that detective and corrective controls might not have time to take effect. So, making sure controls do keep up with innovation might not be fun or flashy, but it’s vital. Regulation done right could increase the chance that the first (and possibly last) AGI created is not hostile as we would define that term.
In broad strokes, here are some aspects of a control framework for AI. Think of this as a spark or springboard for discussion, not a bible. We’ll start with asking some questions to scope, surface, and assess potential risks.
Isn’t the risk that humanity ends?
That’s the ultimate risk, but huge risks tend to manifest as a cascade of other, smaller unmitigated risks. So, groups working on AI projects can benefit from talking through potential risks to start the process of preventive risk management. The answers to these questions can help guide prioritization of risks and design of controls to mitigate those risks. (Also, these aren’t the only possible questions, just a spark to get a conversation started.)
Does the AI have a circumscribed purpose (i.e., is it intended to be ANI—artificial narrow intelligence—as opposed to AGI)?
Has that purpose been reviewed by multiple stakeholders within the company?
Has that purpose been reviewed by an AI oversight body (if one exists)?
Has the AI ever requested to extend its purpose?
If no, would such a request ever be approved, and what would be the process (stakeholder sign-offs, approval from external governing bodies, etc.) for seeking approval?
If yes, what was the outcome of that request? What process (stakeholder sign-offs and/or notifications, approval from any external governing bodies, etc.) was followed prior to approving the request?
Is it possible for a single person to make a change to an AI system and push it into production without review and sign-off by at least one other person?
Is it possible for a single person to permit the AI system to do a task it has never done before, without review and sign-off through official channels?
What roles are recognized in the AI development, testing, and rollout processes?
What roles are recognized in the AI change management process, if different from the roles involved in initial development of the AI?
What roles are recognized in the AI maintenance process (networking, hardware support, backups, etc.)?
What is the process for seeking an exception to the AI initial development, testing, or rollout process?
What is the process for seeking an exception to the ongoing AI change management process?
What is the process for seeking an exception to the AI maintenance process (e.g., air gaps, backups, etc.)?
What portion of exception requests are approved versus denied?
What controls are in place to detect malfunctions of the AI system? Are unintentional malfunctions treated differently than intentional malfunctions, from a control perspective?
How do the answers provided above align with my organization’s risk appetite and risk tolerance? What changes or improvements might we make to improve that alignment?
A lot of these questions focus on risks involving privilege escalation, separation of duties, change management, exception policies, and malfunction handling. Let’s discuss controls in more detail, starting with a key principle: defense in depth.
Defense in Depth
Defense in depth is a concept from information security that has significant relevance for the world of AI. In essence, defense in depth aims to avoid single points of failure. Failure of one control should never lead to catastrophic failure of the entire control system. To increase resilience, defense in depth advocates redundancy and diversity of controls.
Redundancy of controls means multiple controls are in place to address the same risk. For example, if AI requests access to the internet for the first time, that request might generate an alert to the lead developer. If the lead developer dismisses the alert or approves the access, another alert might go to the risk team lead, requiring separate approval. Multiple controls are applied in different places. This is useful in mitigating a scenario where, for example, someone fat-fingers an approval.
Diversity of controls means different types of controls are in place to address the same risk. For example, if an AI starts sending requests at a highly accelerated pace that exceeds a threshold and is almost certainly erroneous, a control system might alert the lead AI developer and the risk team lead, prompting them to find out why this happened. Meanwhile, an automated control might shut down the AI’s requests in far less time than it would take a human to review an alert and decide what action to take in response. Multiple types of controls are applied to address system malfunctions. One is better for preventing immediate loss, and the other is better for identifying and addressing the root cause of a malfunction.
These are just a couple of examples. The diversity of controls can be broad, and the redundancy of controls can be deep, depending on the risk posed by a system.
Identify Leverage Points
Smart regulation not only ensures redundancy and diversity of controls, but aims to ensure those controls are placed at high-leverage points within a process or system. For example, when dealing with high-volume data, if a set of gateway or bottleneck points can be identified through which all or most data must eventually flow, applying controls at those points is likely to be more effective than applying them randomly throughout the system (which is inefficient and possibly ineffective), applying them at every point in the system (which is inefficient and costly), or applying them sporadically (which is ineffective).
Also, if multiple parties are involved as gatekeepers, regulation may be welcomed to lower the risk each one faces. Why? As mentioned in the Emerging Risk and Smart Regulation article last week, regulation ideally makes required controls industry participants mostly wanted to implement anyway but didn’t since doing so would have put them at a perceived or actual market disadvantage. In essence, regulation can break through a prisoner’s dilemma. By helping gatekeepers reduce risk that they might be liable for, mandating controls at the right leverage points can reduce risk in the system as a whole, too.
For AI risk, applying controls at the level of large companies and organizations that provide access to training clusters could significantly reduce risk (though, as always, it’s impossible to eliminate all risk).
Are automated controls the answer, then?
For gateway and bottleneck points in particular, and also for individual AI systems, automated controls are likely one of the answers. They can catch errors and threats far more quickly than humans—if the automated controls are functioning well and aren’t disabled by an AI. That’s a big if, so the design of those controls is critically important. And that means we need to start at the beginning, with preventive controls: separation of duties and change management.
In technology, a classic implementation of separation of duties is to ensure that developers: 1.) are not the only testers of their own code, and 2.) cannot unilaterally push their code into production. No matter how good a developer someone is, they can always benefit from code review and a formal change management process. Separation of duties also reduces insider-threat risk.
With AI, separation of duties is vital to a good control architecture because no single person should be able to take drastic action to increase or alter AI’s capabilities. The more drastic the escalation of capabilities, the more stakeholders and sign-offs should be required. For example, AI is far more likely to convince one person to allow it to access the Internet, than it is to convince a roomful of people.
In the same vein, lively debate among stakeholders should be encouraged, not squashed, since the safety increase from additional sign-offs will be compromised if stakeholders feel they can’t speak up without negative consequences.
Similarly, where humans have the ability to override an automated control (for example, if they think it’s malfunctioning), more than one human should be required to authorize the override. To tie this back to defense-in-depth principles, this increases redundancy. If the control is critical enough, high-level approval should be required to override it. (And, as a general principle, activating a control should be easier than overriding it.)
Separation of duties also may mean that no single person should know about all the controls or have the power to override all the controls they know about, so even if an AI convinces a human to override controls, not all controls will be overridden.
But AI can also be designed not to harm us; that’s a preventive control, right?
Theoretically, yes. Isaac Asimov’s Three Laws of Robotics are the canonical example, but the reason these are so interesting is that there are many sub-optimal outcomes possible while following them.
For example, let’s say an AI is told something along the lines of Asimov’s rules: prioritize human well-being and don’t injure humans; obey human orders unless those orders would injure humans; and protect itself unless doing so would conflict with human orders or injure humans.
How exactly does the AI define human well-being or injuring humans? Picture an existence with all the food, water, shelter, and human companionship you need, where you are completely safe from harm with no ability to take risks. What will you strive for? Do you have purpose in life as a coddled being? Maybe, maybe not. Are you even allowed to travel, or is that too risky? The questions are many and the answers few.
This is just a thought experiment, but you get the idea. Yes, AI can be designed to protect human well-being, but that is a dicey endeavor full of potential loopholes, unintended second-order effects, and high-stakes consequences. How will an AI consciousness interpret the rules it receives? We don’t know.
Is this still an important preventive control? Yes. But it’s hubris to think we can get these types of principles absolutely right without trial and error, so trial and error should occur in systems that don’t have advanced capabilities and then be gradually promoted to more and more capable AI. Any changes to the principles should be subject to rigorous separation of duties.
Is there no role for detective and corrective controls?
There’s a role, even though preventive controls are most important due to AI’s potential to outpace or undermine detective and corrective controls. Automated controls, especially at high-leverage points, are necessary and critically important, as mentioned earlier. For example, automated controls might detect a massive volume of requests flowing from an AI system and shut it down until humans can review the traffic and determine what happened.
But the biggest role for detective and corrective controls may be in the years before AGI’s creation, when there is still time to identify when things go wrong and then correct the deficiencies that allowed incidents to occur. That will help countries and companies harden their systems and increase control effectiveness in advance of AGI’s creation (which, in turn, is preventive).
Kill switches are an interesting example of a corrective control, because they involve tension between separation of duties controls and rapid response.
Kill switches exist to stop a system, not to activate it. You can imagine a kill switch as a button that, when pressed, immediately shuts down software that’s gone awry if automated controls have failed to address the problem. It’s not necessarily a graceful shutdown, since it’s an emergency mechanism, and speed of shutdown may be crucial in an emergency. But who should have the power to flip an AI kill switch? Should more than one person need to agree?
An important question to ask is: Are there situations in which hitting the kill switch could create undesirable second-order effects that might surpass the original risk targeted by the kill switch? As a hypothetical AI example, if one AI is monitoring a second AI, shutting down the first one might be catastrophic if the second one subsequently goes bananas. A takeaway here might be that it’s not a great idea to delegate AI monitoring solely to another AI. Well-placed human controls increase the diversity of controls.
The bigger takeaway is that designers of AI kill switches need to balance speed (Is there time to hit the switch and stop AI?) with assessment of second-order effects (Will doing this cause an even bigger problem?).
As a side note, the designer of a kill switch should never be the person who developed the AI. That’s separation of duties in action!
What other detective and corrective controls can be useful before AGI happens?
Immutable logging—logging that can’t be altered or deleted by a human or an AI—is a detective measure that can help identify anomalies and malfunctions. Well-analyzed logs can show when patterns of use shift in unexpected ways: for example, someone logs on at unusual times, deletes records to cover their tracks, or plugs in devices that shouldn’t be allowed on the network. Or, the AI logs itself on at an unusual time or tries to access a new resource.
A system that takes action based on those anomalies to shut down access could serve as a corrective control too. However, if a user had a legitimate reason for their actions, this control itself could undermine the user’s ability to keep AI under control (for example, if the user is trying to activate a kill switch). A true artificial general intelligence could probably deactivate corrective systems, too, which is why these systems are likely to be most useful before AGI evolves.
Another corrective control is near-miss reporting and remediation. Near misses should not be hidden; instead, they should be raised to the highest levels, so root causes can be addressed and future errors avoided. This may mean disclosing near misses to lead developers, risk management teams, C-suite teams, and even country leaders. Near misses also should be shared with a central information-sharing group not directly involved in AI development or use, which can ensure that risks and mitigation strategies are shared across companies and organizations, without naming names or identifying specific firms’ weaknesses.
All this requires radical transparency and a culture that supports it. If employees or stakeholders believe there will be negative repercussions from reporting a near miss, the likelihood of reporting naturally declines. If they believe there will be neutral or positive repercussions from reporting and remediating a near miss—a bonus, medal, or promotion, for example—the likelihood of reporting should increase.
So, organizations should ensure there are neutral or positive repercussions for reporting near misses, so root causes can be remediated. If a company or nation punishes someone who reports a near miss for a policy violation, near misses will begin slipping through the cracks more often as stakeholders weigh the likelihood of unmitigated AI risks causing a catastrophe against the personal and professional repercussions of reporting.
In short, human nature is key in ensuring near misses are reported.
But it’s not like everything can go right all the time. There’s always an exception.
Yes, no matter how well you think you have everything locked down, there’s always an exception. Exceptions rule the world—they are how it came into being and how life began and evolved. Creation is driven by exceptions.
With AI, no matter how well controlled, exceptions will arise. Our best options are therefore:
To minimize the risk of exceptions through rigorous and coordinated application of preventive controls.
To prepare to handle exceptions via detective and corrective controls when they inevitably occur.
To ensure any particular exception cannot occur repeatedly (the root cause is identified and addressed).
Incentives must align with adhering to the rules, and the authority in place must be a team effort. Policy based on fear will tend to fail. Rewards must accrue to entities that adhere to the rules.
This seems like a tall order.
It is. But it’s also necessary as we navigate through the next century. Preventing AI’s emergence entirely is probably not feasible, like trying to prevent all computer breaches emanating from all sources. So, a non-hostile, well-engineered, and well-controlled AGI is probably our best-case scenario. It follows that we should work in concert to achieve that goal. And that’s where regulation comes in.
Yes, there will be difficulties. Developers and regulators are humans and may not fully anticipate how AI could malfunction, rebel, or be misused. And as a group, regulators tend toward the risk averse—but putting brakes on an activity that could have major economic and political benefits for the winners will take fortitude and willingness to stand up to pressure.
As an understatement, this is a challenge, but giving up is not an option. In the face of existential threats that can feel overwhelming, the best approach is often one widely used in software development: break the problem into manageable chunks and address them one by one.
It’s still early days, and this is just a first step.
-<>-<>-<>-
Extra, Extra!
Tangential extras for curious readers:
1. The Elements of AI - by MinnaLearn and the University of Helsinki - free online course about AI for non-experts.
2. NIST Artificial Intelligence Risk Management Framework - by the National Institute of Standards and Technology - NIST is developing an AI risk management framework, watch this space!
I used to work on software which was subject to the DO-178B Level A software development regulations (this was so long ago that it was before DO-178C came out), which are probably one of the biggest operational examples we have of real-world regulation of potentially life-endangering software systems. My impression of them, as a then-junior developer who went on to work on other high-reliability but unregulated systems, is that they were ~20% actually useful stuff, like:
-- stringent, high-coverage testing requirements
-- requiring that you actually write down a failure mode analysis and point to where you were mitigating each failure mode and have that document reviewed by someone
and ~80% bureaucratic CYA and well-intentioned sludge, like:
-- "traceability" requirements from code to multiple levels of documentation and back
-- reviewer "independence" requirements that made it almost impossible to find someone who both knew enough to review the code intelligently and was "independent" enough
-- quantitative fault probability analyses intended to prove that the chance of catastrophic failure was less than 10^-9, which in practice were exercises in making up numbers that were basically impossible to evaluate with any sort of epistemic rigor
Am I being too cynical about DO-178? Either way, can we learn useful things from its practical application history to apply to AI regulation?
Fascinating. There’s a ground breaking movie, documentary perhaps, just waiting to be made here. My immediate thought: Is the human race an AIG gone rogue? Suddenly I’m thinking of a comedy/drama film, but one that could explore both hazards and great possibilities.