If you’re the US Food and Drug Administration (FDA), you have a lot on your plate. But the train that is barreling right at you – the behemoth that is qualitatively different from everything you’ve encountered to date – is artificial intelligence in medicine.
How on earth are you going to regulate software that purports to be just as smart as a physician?
It’s much different from traditional FDA tasks like regulating an x-ray machine, which, after all, has only to record an image safely and accurately, and is something that engineers can by themselves confirm.
However, an A.I. “decision support” system for x-rays can fail in thousands of different ways when it tries to detect abnormalities or suggest diagnoses. The standard medical textbook for interpreting chest x-rays is 870 pages long – how can any software developer get all of that right? How can the FDA test all the possible failure modes?
The answer is: they can’t. Fortunately, they don’t have to. Instead, the linchpin of regulating medical A.I. systems can be two simple requirements: (1) every time a system provides a suggestion to a physician, the system asks the physician to rate the correctness or appropriateness of the suggestion, and (2) every rating is sent to the FDA, where it is tallied and made public on their web site.
In this way, A.I. systems are rated just as physicians assess each other: continuously, and in every interaction. This is obvious medical sociology. If Dr. Jones in some hospital starts to make squirrelly diagnoses or boneheaded readings, she will soon find her business drying up because her physician colleagues won’t trust her.
The same approach will work when “Dr. Jones” is an A.I. software package.
Thus, the FDA should set some minimum level of competence that allows reasonable A.I. products to get on the market, just as it does with the non-A.I. devices it regulates. After that, public ratings should be required, so that each product lives or dies by its performance in the marketplace, not by marketing budgets or slick advertisements.
As always, sunshine makes the best disinfectant, and it’s what you’d want in an A.I. system that was working on you.
Let’s examine some ramifications of this regulatory approach.
First, it ensures that manufacturers keep their system up to date. If a new disease like Zika virus comes along, which previously wasn’t in the A.I. system, the system will rapidly (and publicly) accumulate lots of errors. Manufacturers will therefore rush to update the system.
Second – similarly – it drives companies to widen their product’s coverage. An A.I. system that lacked moyamoya disease in its training may sail through pre-market testing on only a few hundred patients, but if applied to 200,000 patients across the world, systematic errors in moyamoya patients are more likely to become apparent and stimulate a new round of training that includes moyamoya disease.
Third, it fosters innovation. The FDA will be able to relax the minimum level of competence an A.I. product must demonstrate before being allowed on the market if it (the FDA) is assured that marketplace competition will perform the same functions as regulation. Make no mistake, American businesses will have many challengers in the medical A.I. world, and this factor will let our companies get to market faster.
Fourth, it will put physicians in the frame of mind to question the A.I. system’s advice at every turn. This is essential, because no manufacturer is ever going to claim its system replaces physician judgement. Instead, manufacturers will say their system provides information that a physician considers when making the official decision. Thus, always asking the physician to rate the A.I. advice will remind the physician that s/he is ultimately in control.
Fifth, this is a capability that every responsible software manufacturer should want to include in their systems. Responsible manufacturers want to fix bugs and shortcomings, and so they want to discover them. Manufacturers may even appreciate publicly-demonstrated imperfection, as it could enable them to defeat lawsuits arising from bad advice, by showing that the public had absolutely no reason to believe their system was perfect.
Of course, several specifics need addressing. What specific ratings questions should be posed to physicians? (They must be robust and quick.) How do we encourage physicians to change their ratings a month later, when the true diagnosis finally emerges? How do we prevent gaming of the ratings?
Ratings systems on the Internet are not only ubiquitous already, but have proven their value across many application areas. We are fortunate that such simple software can help us control the most complex software.