September 25, 2016John Sotos

How the health of presidential candidates should be evaluated

First published on WSJ.com on Sept. 25, 2016

Let us put aside for a moment specific questions about the health of this year’s two major Presidential candidates, and ask how candidate health should be treated in the next election cycle. The difficulties attending the current candidates’ health disclosures are by no means unique to these individuals, and so will surely arise again.

There can be little doubt that formal guidelines, or perhaps laws, are needed. This year’s experience shows that the electorate is deeply interested in candidate health, and that lurching from one grudging, partial information disclosure to another serves no one’s interests, including the candidates’.

The traditional approach used in recent election cycles – relying on a candidate’s private physician to decide what information should be disclosed – is obviously maladaptive. It would be the exceptional candidate who could refrain from trying to influence the physician: “After all, doctor, I think I know better the demands of the presidency than you do… with all due respect.” And it would be the exceptional physician who could shed the traditional role of advocating for his or her patients and instead act purely in the nation’s best interests.

But remember that there is an American healthcare system in which physicians explicitly do put national interests before individual interests: the military. Furthermore, this system focusses largely on determining when someone is medically fit to perform a specific job: pilot, cook, parachutist, whatever.

For 31 years I have been part of this medical system, and believe its processes can be a model for fair, clear, and transparent methods to determine what goes into a disclosure about a presidential candidate’s health. Let me explain how, by first explaining how the system works.

Consider an airplane mechanic who wants to become a pilot in the Air Force. Air Force regulations dictate that a local military physician give the mechanic a “class I flight physical examination” – its details are also spelled out in regulations – and that the results of this examination be submitted to a higher headquarters for review. The exam includes a comprehensive medical history.

At the higher headquarters, military physicians compare the mechanic’s medical data with the medical standards for the class I exam, then approve or disapprove the exam as warranted. For example, if the mechanic has a heart murmur, the exam will be disapproved because the class I standards forbid it. Air Force standards are extensive, addressing almost any imaginable medical condition, for every job, and have evolved over decades, so they now tend to change little.

Importantly, our disqualified mechanic still has hope – he can apply for a waiver, because murmurs are not always a sign of disease. His local military physician will send him to a cardiologist, who will listen to the murmur and order an echocardiogram test. The results of the echo, plus the cardiologist’s opinion will be sent to higher headquarters. The headquarters physicians will consider these new, more detailed data, and approve the mechanic’s exam if the new data show nothing structurally wrong with the mechanic’s heart and that this is an “innocent murmur.”

For presidential candidates, the voters are the higher headquarters. They set the medical standards, and they are the only ones who decide if a condition is waiverable or not. But to do this, they must know about the condition.

So, our guidelines (or law) would tell physicians that any medical condition in a presidential candidate that is disqualifying for, say, air traffic controller duties in the Air Force, should be (must be) disclosed to the voters. Everything else could be kept private. Happily, aside from being fanatical about eyes, Air Force standards generally disqualify only significant conditions. Syphilis in the past is not disqualifying, so long as no lasting complications exist. But cancer now or in the past is disqualifying – unless waived.

Next, when a disqualifying condition is present, how much information about it should be publicly disclosed? A sensible initial answer is: as much information as the Air Force requires in waiver applications for the condition. This is sensible because the voters are essentially deciding whether to grant the candidate a waiver. The Air Force has policies on waiver data requirements.

Crucially, no matter what standard is chosen (experts can debate air traffic controller vs. another), military medical standards are time-tested across a very large population of active individuals of both sexes, and are as immune to external influence as any human endeavor could reasonably be. To reduce the chances of one physician secretly warping data, we could require that two physicians, from different states, sign disclosures – akin to peer review.

The most important feature, however, is that the standard will be clear and consistent. The old political truism about a candidate for office applies to whatever standard is chosen: It doesn’t have to be better than the Almighty, just better than the alternative – which today is chaos.

(Disclaimer: The views here are my own, and are not necessarily those of the Department of Defense or the California Military Department.)

July 19, 2016John Sotos

Adaptive Clinical Trials

Rejected by the New England Journal of Medicine in July 2016.

To the Editor:

Three different publications in the July 7, 2016 issue of the Journal introduce the concept of adaptive clinical trials and state that, compared to classical “frequentist” trials, adaptive trials may require smaller groups of subjects and be completed more quickly (1) (2) (3). Yet, neither of the adaptive trials reported in the same issue (4) (5) described how long it took to recruit their subjects, nor estimated how long an equivalent classical trial might have required. Such comparisons would be of interest, as small reductions in time might not be worth the additional complexity that attends adaptive trials.

[1] Bhatt DL, Mehta C. The changing face of clinical trials: adaptive designs for clinical trials. N Engl J Med. 2016; 375: 65-74.

[2] Harrington D, Parmigiani G. Statistics in medicine: I-SPY 2 -- a glimpse of the future of phase 2 drug development? N Engl J Med. 2016; 375:7-9

[3] Carey LA, Winer EP. I-SPY 2 -- toward more rapid progress in breast cancer treatment. N Engl J Med. 2016; 375: 83-84.

[4] Park JW, et al. Adaptive randomization of neratinib in early breast cancer. N Engl J Med. 2016; 375: 11-22.

[5] Rugo HS, et al. Adaptive randomization of veliparib–carboplatin treatment in breast cancer. N Engl J Med. 2016; 375:23-34.

June 29, 2016John Sotos

It's time to radically change how the FDA approves drugs

First published on WSJ.com on June 29, 2016

I’m a big fan of the US Food and Drug Administration (FDA) and the vital mission it’s been performing since 1962: ensuring that all medications sold in the United States are both safe and effective. Everyone should want the FDA to succeed – now and in the future – because, without a strong FDA, being sick would be massively more horrible than it already is.

But, although a fan, I think the FDA should change course. Specifically, the FDA should adjudicate new drug applications with a Consumer Reports approach, not its current approach, which copies Roman emperors who signaled a gladiator’s fate with either a thumbs-up or thumbs-down, and no other choice.

In other words, the FDA would continue to review new drug submissions in its careful and scientifically demanding way, just as it does now, but instead of making an approve/don’t approve decision, it would issue a rating of the drug’s safety, efficacy, and the degree of evidence supporting safety and efficacy. Physicians and patients would use these ratings as a starting point – and only as a starting point – in deciding which medication to use.

So, for example, an experimental chemotherapy drug might rate a B+ for efficacy against stage IV-A squamous cell lung cancer, an A for safety, and a 2 out of 10 for degree of evidence (more on that later). After additional study results are submitted to the FDA, the degree of evidence might rise to 4 out of 10, perhaps with changes in the safety and efficacy scores. The FDA’s expert panels would define the efficacy scale for each disease. The safety and degree-of-efficacy scales would hopefully be more universal.

This graded approach would have multiple major benefits.

First, it would allow drugs to reach the market faster, that is, before all the normally required clinical trials for a drug are completed. For patients with life-threatening diseases and no other options, drugs having low degree-of-evidence scores – lower than normally needed to gain FDA approval – would be a welcome option. And pharmaceutical companies would welcome the chance to monetize their research investment at an earlier stage, although they would have to commit to continued manufacture of the drug.

Second, and probably more importantly, it would drive drug makers to compete on the degree of evidence supporting their drugs, not just their safety and efficacy. Thus, a pharmaceutical company could advance at least one component of a drug rating by performing more studies on the drug. This solves the huge problem of scant post-marketing drug studies, that is, studies performed after the FDA approves the drug. Generic drug-makers would feel this pressure, too. Anything that helps physicians better understand which drug is right for which patient will pay large dividends in clinical outcomes and cost.

Third, this graded approach is well-suited to the revolution in drug development that has already begun. Certainly in oncology, we can expect that future drugs will be targeted to precisely-defined small groups of patients, not to large patient groups. This mirrors the evolution of the software market from the past’s three big “blockbuster” programs – Word, Excel, and Powerpoint – to today’s multiplicity of very specialized programs in the app store, where ratings are a key element of an app’s market success.

Fourth, it will help physicians make better decisions. If the FDA used a standard vocabulary to define the patients for whom a given rating applied, electronic medical records could automatically match a given patient to the corresponding FDA ratings and display the latest ratings, allowing the physician to stay current with much less effort.

Finally, patients will better understand their physician’s prescribing decisions, and be able to better participate in shared decision-making. In the many cases where ratings do not ultimately drive which drug to prescribe, patients will appreciate hearing their physician’s evidence-based reasons for choosing differently.

Re-working drug regulation to this graded approach will require significant effort, especially defining robust and informative rating scales. The degree-of-evidence scale, for example, must not only incorporate the number of patients tested, but also the study design, so a multi-dimensional scale will likely work best (unlike the uni-dimensional scale in the earlier example). Because there will be enormous pressure to exploit holes in the scales to gain unfair advantage, the scales will have to be “unhackable.”

Although the FDA would seemingly change from a regulatory agency to an information referee, all of its power to keep drugs off the market should be preserved, to be used in exceptional circumstances. Confidence in the new approach would be increased if it went live before phasing out the thumbs up/down approach.

Information sharing today is utterly different from that in 1962. The FDA is in an enviable and trusted position to share much more data for each drug than it has in the past, cement the role of evidence in drug prescribing, and increase medication choices. All of which will not only create new fans, but keep them

September 29, 2015John Sotos

How electronic medical records are like airplane crashes (Part 2)

First published on WSJ.com on Sept. 29, 2015

Yesterday’s blog post showed how electronic medical record systems can divert, or “channelize,” a nurse’s attention away from the patient, causing harm to the patient – just as channelizing a pilot’s attention can cause an airplane crash.

Continuing the theme that information without attention is worthless, today’s blog shows two other ways that EMRs can crash patients like airplanes, based on recent experiences of a friend, named Alex.

Alex returned home from the hospital elated but exhausted. After a few days, with the exhaustion worsening and her usual internist unavailable, she was directed to see an internist new to her. Fearing lung blood clots, this internist ordered a CT scan of the chest. It showed no clots, but – completely unexpectedly – showed a large mass in the chest that the radiologist felt was cancer. The internist compassionately broke the shattering news, but Alex responded as most people would to a cancer diagnosis.

Two days later, a physician friend of limited experience, but who had detailed memories of Alex’s medical history, read the radiologist’s report and offered a different diagnosis: the mass in the chest was actually a greatly enlarged vein that had grown up following an unusual blood clot Alex had suffered years ago.

An agonizing week later, super-specialists at the university hospital reviewed the CT and confirmed the vein diagnosis, telling Alex she was cancer-free and needed no tests to prove it. Alex was overjoyed.

Besides bad doctoring, what happened here? And why must the electronic medical record (EMR) share responsibility for Alex’s psychological brutalization? The answer lies in two characteristics of the EMR.

First, EMRs reduce the mouth-to-ear conversations physicians have with each other, exactly as email reduces conversations in business offices. While this may increase efficiency with routine medical care, when a case has unusual elements, conversations become essential. In Alex’s case, the internist and radiologist should have talked to each other the day of the CT. The internist should have challenged the radiologist’s interpretation, and the radiologist should have pressed the internist for more information about Alex’s history. Between them, they would have had a chance to figure things out. Both, however, probably thought they had all relevant information from the EMR, making such an exchange “unnecessary.”

Second, the EMR mixed informational wheat with chaff. Alex’s extensive EMR record did indeed mention the old blood clot that led to the correct diagnosis, but both the internist and radiologist failed to find this historical nugget, not believing it worth their while to study the record, even for an unusual case. In other words, the friction of seeking and absorbing information in the EMR inhibited their search.

Reinforcing these points, a week later Alex visited a hospital clinic to discuss her medications with a pair of doctors. Reviewing her EMR record, the pair re-shattered Alex by declaring that cancer was still possible, and by recommending two tests. Fortunately, Alex resisted, saying that information in the EMR obviated these tests, and insisting that these doctors actually talk to the super-specialists. When this was done (three nerve-wracking days later) the doctor-pair relented, and pronounced Alex cancer-free. For the second time in two weeks, Alex felt the joy of a cancer cure.

These EMR problems have strong aviation parallels.

Regarding communication, all aircrew members receive extensive training in cockpit communication, to overcome cultural and other influences that tend to silence the exchange of observations and questions. In medicine, the EMR is inhibiting important mouth-to-ear conversations.

For wheat vs. chaff, in every aircraft’s thick technical manual, emergency procedures are printed in boldface font, and no one is allowed to fly the aircraft until they prove they have memorized the boldface. In medicine, each human’s “boldface” is unique, but EMRs contain so much chaff – courtesy of simple cutting and pasting, plus reimbursement-motivated templating – that doctors cannot reasonably find their patients’ boldface, as four different doctors proved to Alex.

Bad as these problems are, another exceeds them. After an airplane crash, aggressive and intrusive external investigators pinpoint causes and assign blame, aiming to propose system fixes that will prevent recurrences. Medicine, with its less fiery accidents, rarely conducts such investigations; certainly, none of the errors that befell Alex were officially recognized as such. And so, defective systems – whether software, organizational, or human – continue defective, and kill and maim.

Medicine’s need for a true aviation-style safety culture has long been recognized. But as the amount of physician-computer interaction increases, it becomes ever more vital to implement.

September 28, 2015John Sotos

How electronic medical records are like airplane crashes (Part 1)

First published on WSJ.com on Sept. 28, 2015

Without doubt, electronic medical records are killing and injuring people… for some of the same reasons that airplanes crash.

It’s not surprising. Aviation engineers have long known that the interface between humans and technology can contribute to accidents. But because doctors and patients are less aware, I’d like to share two true tales – one today and one tomorrow – in which a friend I’ll call Alex was harmed by the electronic medical record system at a world famous university hospital.

In the first tale, we find the hospital staff botching the brainlessly simple orders for Alex’s intravenous fluids, not once, but three times in three days. On the first day, when she could not eat or drink, the staff failed to order the fluids that would substitute. On the second day, they gave her three times the I.V. fluid she needed. On the third day, when she again could not drink, they again omitted I.V. fluids.

Although not catastrophic, these errors did have consequences. On day two, Alex had venous catheters in both legs, forcing her to use a bedpan – which she did every 20 minutes, thanks to the excess fluids. Naturally, these gymnastics triggered brisk bleeding around the catheter sites (she was on blood thinners), which soaked a blanket, which brought a physician from home at 1 a.m. to assess and re-bandage her.

When a knowledgeable staff commits stupid errors like these, there is only one explanation: inadequate attention.

Physician inattention to routine fluid orders is inexcusable, but hardly news: nurses are forever reminding physicians to write such orders because they, the nurses, are far more hands-on with fluid issues. They tell the patient “No eating or drinking,” they monitor I.V. fluid bags emptying out, and they independently review every fluid order that physicians write.

But, when three different nurses, on three different wards, on three different days miss obvious fluid mismanagement, something deeper is awry. I cannot imagine the sharp-eyed university nurses of 20 years ago letting me make these errors… and that is where the electronic medical record (“EMR”) comes in.

Today, nurses at Alex’s hospital are, almost literally, chained to a wheelable computer station that runs the EMR and goes with them from patient-room to patient-room. A basic nursing task, such as documenting a patient’s urination, requires the nurse to walk to the computer, sign on to the EMR (itself a chore), grasp the mouse, select the patient, click a “urination” tab (eventually), move hands to keyboard, type the volume of urine, then click “Save.” Any new data, alerts, or orders on the screen will distract the nurse from thinking about the significance of the urine volume just produced.

Before EMRs existed, a nurse would lift the clipboard hanging at the foot of the bed, grasp a pen hanging around his or her neck, write the time and the volume of urine produced, then re-hang the clipboard. Time elapsed: less than 4 seconds – 6 if the nurse reviewed earlier urination values recorded on the clipboard.

Obviously, the EMR demands more attention from the nurse than the clipboard. In aviation-speak, the EMR “channelizes” attention to itself. Alex’s nurses devoted less attention to Alex and her fluid status because their attention was channelized to the EMR system.

Channelized attention is a major human-factors cause of airplane accidents. A pilot fiddling with a broken intercom is paying less attention to flying the plane. Usually, he or she will get away with it… but not always.

EMR proponents argue that presenting timely, complete information to the nurse at the bedside more than offsets any cost in attention. I doubt it. Detecting Alex’s fluid mismanagement required only a glance at her, a glance at the I.V. pole next to her, and a modicum of thought. Obviously, the glances and thought did not happen… three times.

EMR vendors must realize that the human-computer interface in their systems is more than a marketing differentiator. It is instead, like cockpit controls, a critical component in a critical system that must be designed to be undemanding of attention and cognition. Anything less will create new cemetery plots, as surely as poor cockpit controls create smoking holes.

This blog post has two parts. Read Part 2 here.