Mining Electronic Records for Revealing Health Data

This is an excellent article about utilising the data  residing in electronic records for research purposes. The article also hints of the inherent issue when utilising these valuable data residing across different electronic records – the lack of interoperability.

Enjoy the article from The New York Times.

Over the past decade, nudged by new federal regulations, hospitals and medical offices around the country have been converting scribbled doctors’ notes to electronic records. Although the chief goal has been to improve efficiency and cut costs, a disappointing report published last week by the RAND Corp. found that electronic health records actually may be raising the nation’s medical bills.

But the report neglected one powerful incentive for the switch to electronic records: the resulting databases of clinical information are gold mines for medical research. The monitoring and analysis of electronic medical records, some scientists say, have the potential to make every patient a participant in a vast, ongoing clinical trial, pinpointing treatments and side effects that would be hard to discern from anecdotal case reports or expensive clinical trials.

“Medical discoveries have always been based on hunches,” said Dr. Russ B. Altman, a physician and professor of bioengineering and genetics at Stanford. “Unfortunately, we have been missing discoveries all along because we didn’t have the ability to see if a hunch has statistical merit. This infrastructure makes it possible to follow up those hunches.”

The use of electronic records also may help scientists avoid sidestep the rising costs of medical research. “In the past, you had to set up incredibly expensive and time-consuming clinical trials to test a hypothesis,” said Nicholas Tatonetti, assistant professor of biomedical informatics at Columbia. “Now we can look at data already collected in electronic medical records and begin to tease out information.”

Recent work by Dr. Altman and Dr. Tatonetti, published in 2011, offers a compelling case study. As a graduate student at Stanford, Dr. Tatonetti devised an algorithm to look for pairs of drugs that, taken together, cause a side effect not associated with either drug alone. One pairing popped up when he used his new software to search the Food and Drug Administration’s database of adverse drug reports: Paxil, a widely used antidepressant, and Pravastatin, a cholesterol-lowering drug.

Neither was known to raise blood sugar, but Dr. Tatonetti’s results suggested they might when taken together.

For confirmation, he and Dr. Altman turned to Stanford University Medical Center’s electronic medical records. The scientists needed to find patients who were prescribed either Paxil or Pravastatin, had a blood sugar test, were then prescribed the second medication, and had another blood sugar test — all within a period of a few months.

Finding such patients was a tall order, but the medical center’s database was large enough that eight cases surfaced. In most, patients had experienced a significant increase in blood sugar. The researchers expanded their search to databases at Harvard and Vanderbilt. They found about 130 cases that fit the improbable criteria — and more evidence that patients given both drugs showed a rise in blood sugar.

The F.D.A. is currently evaluating the data to see if they warrant new information on the drugs’ labels. “I underestimated the abilities of a clever informatician to figure out algorithms for data mining,” said Dr. Altman, once a critic of this sort of “data mining.”

“We didn’t need to set up a clinical trial,” he said. “We didn’t need to enroll a single research subject.”

Kaiser Permanente, which documented the connection between Vioxx and heart trouble nearly a decade ago by reviewing internal medical records, is now testing preliminary evidence that men taking statin drugs for cholesterol have a lower risk of a recurrence of prostate cancer. The organization is also evaluating diabetes protocols, using a database of more than 25,000 people over age 80 with diabetes — a difficult population to study in clinical trials.

“That’s a remarkably rare opportunity to look at a population that has many other health issues going on,” said Elizabeth A. McGlynn, director of the Kaiser Permanente Center for Effectiveness and Safety Research. “The sheer volume and the richness of the data will enable us to have insights that are beyond anything we could have had any other way.”

But the challenges posed by this sort of research are significant. The information entered into a medical record may be wrong, and diagnostic codes are notoriously unreliable, according to Dr. Tatonetti, partly because they are also used for billing. And doctors don’t think like researchers.

“If a patient gets well after treatment, a physician may not feel the need to follow up with a lab test because it doesn’t have any clinical usefulness,” Dr. Altman said. “But that’s exactly the kind of data a researcher looks for.”

Perhaps the most pressing issue is patient privacy. Electronic health records must be “de-identified” before they can be used for research. That requires more than simply removing a name. Any information that might identify the patient must be excised. At the same time, researchers have to be able to tell when they’re looking at records from the same patient, which may be stored in several databases.

“One patient may be in as many as 20 different databases,” said Dr. William S. Dalton, founding director of the Personalized Medicine Institute at Moffitt Cancer Center, which is currently tracking more than 90,000 patients at 18 different sites around the country. Moffitt combines information from the electronic medical record with data from X-rays and other imaging studies, tumor tissue cultures and even genetic profiles.

“There’s an immense amount of information and different databases, all using different data dictionaries,” Dr. Dalton said. “And they don’t all agree.”

Kaiser Permanente, which has a database of 9 million patients, stores about 30 petabytes of data — more than three times the digitized storage of the Library of Congress. The organization adds about two terabytes of data a day. (A petabyte is a thousand terabytes, and a terabyte is a thousand gigabytes.)

Despite the challenges, a growing number of academic medical centers and health organizations are de-identifying patient records to make them available to researchers.

The University of California’s medical centers and hospitals recently merged systems to create a database with more than 11 million patient records. The F.D.A. has launched a rapid-response surveillance system, called Mini-Sentinel, that combines data from the medical records of more than 100 million individuals. In another collaboration, called eMerge, institutions around the country are combining genetic information with electronic medical record data.

The gold standard of medical research, the randomized, controlled trial, or R.C.T., isn’t going to go away, experts say. But evidence culled from electronic medical records promises to broaden knowledge beyond what can be learned in a carefully structured study.

“R.C.T.’s have their own sorts of biases,” said Dr. Jonathan Darer, in the division of clinical innovation at Geisinger Health System. “Frequently they exclude the most important populations. It’s great to do guidelines for patients who only have diabetes, for example. Unfortunately those aren’t the patients I see in my clinic, where they also have osteoporosis or hypertension or dementia and other health problems. What I really need is help with those complex patients.”

Electronic records are “way less controlled, way less scientifically designed” than the information-gathering techniques used in huge trials, Dr. Altman said. But they offer researchers far more data to work with.

“There’s a growing sense in the field of informatics that we’ll take lots of data in exchange for perfectly controlled data,” he said. “You can deal with the noise if the signal is strong enough.”

New advanced computing institute to tackle big problems

Interesting news pertaining to Big Data from The Seattle Times, enjoy 🙂

The University of Washington and the Department of Energy’s Pacific Northwest National Laboratory are forming a new enterprise, the Northwest Institute for Advanced Computing, to tackle a wide range of the world’s most vexing issues – from the causes of disease to how climate change will impact the planet.

The institute is designed to find ways to mine the huge amounts of data generated every day by scientific instruments and household electronics, said Doug Ray, associate director of Richland-based PNNL, in a release.

Ray said researchers at the new institute will tackle ‘big data’ to help improve the quality of life, taking on the most pressing problems facing science and society.

For example, new computational techniques can help design a smart electric grid system, and better analysis of biological data can help determine the cause of diseases. Computer modeling can be used to explore climate change impacts. Cellphone data could even be analyzed and used to find ways to decrease idling traffic.

At the institute, UW and PNNL researchers will jointly explore advanced computer system designs, accelerate data-driven scientific discovery and improve computational modeling and simulation. It will also become a training ground for future researchers.

“The new center is fundamentally about methodology,” said the UW’s Ed Lazowska, the Bill & Melinda Gates Chair in Computer Science and engineering and director of the UW eScience Institute.

“’Big data’ is transforming the process of discovery in all fields,” he said. “UW and PNNL have significant and complementary strengths; together we’ll be able to do amazing things.”

The institute, which will be headquartered in the UW’s Seig Hall, will be led by UW electrical engineering chair and Applied Computational Engineering Lab director Vikram Jandhyala and PNNL fellow Moe Khaleel, who directs PNNL’s Computational Science and Mathematics research division.

Last year, the UW generated a buzz in the computer science world when it hired four computer-science superstars to join the faculty.

Tracking illness to stop an outbreak

I chanced upon this article, which I thought is a good reflection on utilising emerging technology in public health.

In what has been called the “Age of Big Data,” when corporations are finding new ways to mine information to boost profits, University of Iowa professor Alberto Segre and a team of colleagues are channeling their work in data science to achieve something greater.

Segre is among the leaders of an interdisciplinary group of UI experts known as CompEpi, short for computational epidemiology, which conducts sophisticated data-driven research that probes how diseases spread.

The team, which includes six core faculty members and additional contributing professors and students, has put together studies as varied as tracking flu outbreaks using Twitter, to determining who should be vaccinated first among hospital staff to prevent the spread of an illness.

With studies showing that the failure of health care workers to perform proper hand hygiene is one of the leading causes of health care associated infection, of particular interest to the research group has been how technology can be used to monitor and improve hand washing practices in hospitals.

“There are all these different and great computational problems — real data that have real impact in peoples’ lives,” said Segre, chairman of the Department of Computer Science. “In the case of hospitals, health care-associated infections are a huge source of mortality and increasing health care costs, not to mention suffering. If we can reduce that, I feel like as a computer scientist I’ve done more than shave an epsilon off of Walmart’s trucking schedule to save another million dollars. So there’s this really nice social dimension to this.”

Since CompEpi’s formation about five years ago, the group has conducted a series of studies inside UI Hospitals and Clinics, beginning with hiring graduate students to observe and record workers’ movements and interactions to determine how health care-associated infections such as MRSA are spread in a hospital.

From there, the researchers used medical record system log-ins — each instance a hospital worker signed into the computer system — to track the workers’ movements throughout the hospital. That project yielded more than 2 million pieces of data, which was a vast improvement over the 6,500 samples collected by the grad students in the observational study, and the researchers used the information to build a contact network model to show how an infection could spread from person to person.

The scientists, wanting even more precise information on hospital workers’ movement, next teamed with UI engineers to build wearable computers — a re-purposed pager case that houses a processor and radio that broadcasts a worker’s location every 13 seconds. Additionally, they fixed instruments to hand soap dispenser pumps that measure frequency with which workers were washing.

Future studies will use even smaller tracking devices that will clip behind a doctor’s badge, as well as equip workers with wrist sensors that will measure their hand movements to gauge how effectively they’re washing.

“One of our overarching goals is to develop computational approaches to help understand why and in what situations health care workers do not practice appropriate hand hygiene and to use our findings to help model and understand other behaviors in order to make hospitals safer,” said Phil Polgreen, an associate professor in the Department of Internal Medicine and one of CompEpi’s founders, in an email.

Geb Thomas, an associate professor in the Department of Mechanical and Industrial Engineering, has overseen the development of the electronic hardware used in the hand washing studies.

“It’s been a lot of fun for the engineering students to bring their skills to bear on practical problems in the hospital,” Thomas said. “Almost all of the students who have worked for me have been excited about how important the problem is, and they’ve all gained a lot of practical experience in terms of how to actually make something that’s going to work reliably during the experimental time.”

Other research projects undertaken by CompEpi in recent years include:

• Researchers developed a program that uses the Twitter stream to monitor the spread of influenza by analyzing keywords such as “flu,” “sick,” “sniffles” and “fever,” then uses the geotagged locations of the tweets to map cases of illness. Although the Center for Disease Control collects data to measure flu activity, it typically takes about two weeks for the agency to compile results, Segre said. Using Twitter, however, measurements can be made in real time.

• The group developed a tool for the Iowa Department of Public Health that helps determine where to locate surveillance sites for influenza to ensure maximum coverage for the population.

• CompEpi members developed an app, which is available on the iTunes Store, designed for health care workers to manually track hand hygiene using iPads or iPods. It’s been downloaded thousands of times and is used in hospitals throughout the nation.

Ted Herman, a professor in the Department of Computer Science and one of CompEpi’s founding faculty members, said the collaboration with medical experts and those in other disciplines has allowed them to apply their respective skills in new contexts.

“We try to bring in people from different backgrounds and different departments because some of these problems are things which can’t be solved in just one way,” Herman said. “You need talents from different areas.”

Segre says projects like this are just the beginning of how data science can be applied to health-related research.

“These things are early attempts to understand how data science can impact quality of care and patient outcomes,” Segre said. “And I think the conversation is changing.”

Now the idea for secondary use of data captured in EMRs is not new but the application of Big Data analytics in real-time will lead to some new innovate applications.

The next question is, how do we spread the knowledge so the average Joe working in healthcare IT can apply this technology in an affordable manner.