Mining Electronic Records for Revealing Health Data

This is an excellent article about utilising the data residing in electronic records for research purposes. The article also hints of the inherent issue when utilising these valuable data residing across different electronic records – the lack of interoperability.

Enjoy the article from The New York Times.

Over the past decade, nudged by new federal regulations, hospitals and medical offices around the country have been converting scribbled doctors’ notes to electronic records. Although the chief goal has been to improve efficiency and cut costs, a disappointing report published last week by the RAND Corp. found that electronic health records actually may be raising the nation’s medical bills.

But the report neglected one powerful incentive for the switch to electronic records: the resulting databases of clinical information are gold mines for medical research. The monitoring and analysis of electronic medical records, some scientists say, have the potential to make every patient a participant in a vast, ongoing clinical trial, pinpointing treatments and side effects that would be hard to discern from anecdotal case reports or expensive clinical trials.

“Medical discoveries have always been based on hunches,” said Dr. Russ B. Altman, a physician and professor of bioengineering and genetics at Stanford. “Unfortunately, we have been missing discoveries all along because we didn’t have the ability to see if a hunch has statistical merit. This infrastructure makes it possible to follow up those hunches.”

The use of electronic records also may help scientists avoid sidestep the rising costs of medical research. “In the past, you had to set up incredibly expensive and time-consuming clinical trials to test a hypothesis,” said Nicholas Tatonetti, assistant professor of biomedical informatics at Columbia. “Now we can look at data already collected in electronic medical records and begin to tease out information.”

Recent work by Dr. Altman and Dr. Tatonetti, published in 2011, offers a compelling case study. As a graduate student at Stanford, Dr. Tatonetti devised an algorithm to look for pairs of drugs that, taken together, cause a side effect not associated with either drug alone. One pairing popped up when he used his new software to search the Food and Drug Administration’s database of adverse drug reports: Paxil, a widely used antidepressant, and Pravastatin, a cholesterol-lowering drug.

Neither was known to raise blood sugar, but Dr. Tatonetti’s results suggested they might when taken together.

For confirmation, he and Dr. Altman turned to Stanford University Medical Center’s electronic medical records. The scientists needed to find patients who were prescribed either Paxil or Pravastatin, had a blood sugar test, were then prescribed the second medication, and had another blood sugar test — all within a period of a few months.

Finding such patients was a tall order, but the medical center’s database was large enough that eight cases surfaced. In most, patients had experienced a significant increase in blood sugar. The researchers expanded their search to databases at Harvard and Vanderbilt. They found about 130 cases that fit the improbable criteria — and more evidence that patients given both drugs showed a rise in blood sugar.

The F.D.A. is currently evaluating the data to see if they warrant new information on the drugs’ labels. “I underestimated the abilities of a clever informatician to figure out algorithms for data mining,” said Dr. Altman, once a critic of this sort of “data mining.”

“We didn’t need to set up a clinical trial,” he said. “We didn’t need to enroll a single research subject.”

Kaiser Permanente, which documented the connection between Vioxx and heart trouble nearly a decade ago by reviewing internal medical records, is now testing preliminary evidence that men taking statin drugs for cholesterol have a lower risk of a recurrence of prostate cancer. The organization is also evaluating diabetes protocols, using a database of more than 25,000 people over age 80 with diabetes — a difficult population to study in clinical trials.

“That’s a remarkably rare opportunity to look at a population that has many other health issues going on,” said Elizabeth A. McGlynn, director of the Kaiser Permanente Center for Effectiveness and Safety Research. “The sheer volume and the richness of the data will enable us to have insights that are beyond anything we could have had any other way.”

But the challenges posed by this sort of research are significant. The information entered into a medical record may be wrong, and diagnostic codes are notoriously unreliable, according to Dr. Tatonetti, partly because they are also used for billing. And doctors don’t think like researchers.

“If a patient gets well after treatment, a physician may not feel the need to follow up with a lab test because it doesn’t have any clinical usefulness,” Dr. Altman said. “But that’s exactly the kind of data a researcher looks for.”

Perhaps the most pressing issue is patient privacy. Electronic health records must be “de-identified” before they can be used for research. That requires more than simply removing a name. Any information that might identify the patient must be excised. At the same time, researchers have to be able to tell when they’re looking at records from the same patient, which may be stored in several databases.

“One patient may be in as many as 20 different databases,” said Dr. William S. Dalton, founding director of the Personalized Medicine Institute at Moffitt Cancer Center, which is currently tracking more than 90,000 patients at 18 different sites around the country. Moffitt combines information from the electronic medical record with data from X-rays and other imaging studies, tumor tissue cultures and even genetic profiles.

“There’s an immense amount of information and different databases, all using different data dictionaries,” Dr. Dalton said. “And they don’t all agree.”

Kaiser Permanente, which has a database of 9 million patients, stores about 30 petabytes of data — more than three times the digitized storage of the Library of Congress. The organization adds about two terabytes of data a day. (A petabyte is a thousand terabytes, and a terabyte is a thousand gigabytes.)

Despite the challenges, a growing number of academic medical centers and health organizations are de-identifying patient records to make them available to researchers.

The University of California’s medical centers and hospitals recently merged systems to create a database with more than 11 million patient records. The F.D.A. has launched a rapid-response surveillance system, called Mini-Sentinel, that combines data from the medical records of more than 100 million individuals. In another collaboration, called eMerge, institutions around the country are combining genetic information with electronic medical record data.

The gold standard of medical research, the randomized, controlled trial, or R.C.T., isn’t going to go away, experts say. But evidence culled from electronic medical records promises to broaden knowledge beyond what can be learned in a carefully structured study.

“R.C.T.’s have their own sorts of biases,” said Dr. Jonathan Darer, in the division of clinical innovation at Geisinger Health System. “Frequently they exclude the most important populations. It’s great to do guidelines for patients who only have diabetes, for example. Unfortunately those aren’t the patients I see in my clinic, where they also have osteoporosis or hypertension or dementia and other health problems. What I really need is help with those complex patients.”

Electronic records are “way less controlled, way less scientifically designed” than the information-gathering techniques used in huge trials, Dr. Altman said. But they offer researchers far more data to work with.

“There’s a growing sense in the field of informatics that we’ll take lots of data in exchange for perfectly controlled data,” he said. “You can deal with the noise if the signal is strong enough.”

Leave a Comment