When 24×7 editor Julie Kirst asked me to come up with a column covering “de-identified case discussions on specific issues and some failures to make it real,” I saw it as a challenge I’d brought upon myself by closing an article I’d written about the applicability of Duke civil engineering professor Henry Petroski’s, PhD, writings on the study of engineering failures with the suggestion that we get more stories about failures into the literature.1 For those of you unfamiliar with his work, Petroski’s teaching can perhaps best be summed up in his own words:
“Given the faults of human nature, coupled with the complexity of the design of everything, from lectures to bridges, it behooves us to beware of the lure of success and listen to the lessons of failure.” 2
So how to begin? I wondered where I would get the stories. After all, one of the reasons I wrote the Petroski article in the first place was to argue that we should be more forthcoming about failures and their consequences. It is not something we are terribly comfortable with, given our industry’s regulatory makeup and history of medicolegal concerns. I decided to give it a shot by surveying some of my “usual suspects”—colleagues from my 30 years in the business, and the community of clinical engineers and biomedical equipment technicians who communicate via Biomedtalk. The handful of replies received can perhaps best be termed a glimpse into our rapidly changing world.
One respondent told of a recent installation of a system intended to assist nurses in assuring that the right medications were given in the right doses to the right patient at the right time: “I have never had so many problems with one handheld machine coming down the pike for one repair after another. The number one issue that I bet you can guess is ‘network failure … unable to connect to server.’ This frustrates the already hardworking nurses to no end. The device freezes from one application to the next and is always giving trouble.”
How less helpful can a message get? “Network failure, unable to connect to server” on a display is on a par with “Broken” on adhesive tape. However, the latter generally comes from clinical staff not expected to be proficient regarding human factors issues, while the former comes from a manufacturer that is. This brings to mind the question as to when a network becomes a medical device and suggests a way to reframe it: Instead of addressing the issue from the perspective of the role of the network in a device system, it might be more illuminating to consider the potential consequences of network-related failures in point-of-care workflows.
The phenomenon of networked devices freezing up was noted by others as well. A couple of respondents described “network storms” where clock-related communications traffic overload caused bedside monitors to essentially stop monitoring.
One reported his hospital was considering going on divert before the immediate problem was resolved.
Both reported definitively addressing the source problem with the manufacturer. However, the actions that led up to the failure (setting time incorrectly on central stations) were arguably ones that the manufacturer could have anticipated in a risk-management process like that of ISO 14971, Risk Management for Medical Devices, which calls for consideration not only of “intended use” but also “reasonably foreseeable misuse.” That the failure propagated across the patient monitoring network to affect other communicating systems also brings to mind warnings about the increased potential for “systems accidents” where failures that would otherwise be local to a device or system can propagate via elements in common with other devices or systems.
And this is not the only example reported where the failure of a single component disrupted a system: ” … we have a [telemetry system] with 170 patient monitoring capabilities. Our network went down … one night due to a breaker trip … the UPS lasted for 30 minutes and then died. Unfortunately, we were not monitoring the UPS incoming power. Our on-call tech was not far away and arrived quickly. He found the breaker and reset it. System downtime was about 45 minutes. Not monitoring patients for 45 minutes is not good. Luckily, we do have contingency procedures in place for medical telemetry. All units have a built-in display of patient data and clinical staff around to observe this data. They also connect high-risk patients to hard-wired devices when available. Needless to say, we tied the AC power in that critical room to our building automation system. Now if AC power drops out, we get beeped.”
I don’t think the respondent gave his organization enough credit, because it was not luck but good practice that provided for the contingency plans that enabled the continuance of patient care through an outage. But note how a system can be brought to its knees by the failure of a component that might have been considered outside the boundaries of the system during analysis and design—in this case the AC power. Ironically, a breaker trip during maintenance here in Boston brought our entire subway system down the week before I wrote this.
Another respondent reported about experiences with remote real-time video and EEG (VEEG) for neuro ICU and interoperative monitoring (IOM) purposes. He notes, ” … real-time system integrity can be critical. A momentary loss of connection of a few seconds is barely noticeable when using a Web server to surf the Internet. However, both VEEG and IOM services use vendor software that sends continuous waveform data from the recording part of the program to the observing part of the program running at the remote site. [In some cases,] the network link and programs have to be restarted. The more devices in the path, such as network routers and switches, the less reliable in a real-time sense is the connection. If one of these restarts happens at a critical part during an IOM surgery, then an important event might be lost, possibly leading to nerve damage.”
Three different applications—medication administration management, physiological monitoring, and neurological monitoring—requiring network resources have been described, along with some of their failure modes and consequences. While it would be instructive and interesting to know if this small sample is representative of others’ experiences, these stories alone point to the need to examine our industry’s engineering, technical, and regulatory practices in light of the changes network-based medical device systems bring to the point of care.
Rick Schrenker is a systems engineering manager in the department of biomedical engineering, Massachusetts General Hospital, Boston. For more information, contact .
- Schrenker R. Learning from Failure: The Teachings of Petroski. BI&T. 2007;41(5):395-398.
- Petroski H. Success Through Failure—The Paradox of Design. Princeton, NJ: Princeton University Press; 2006.