Annotation, development and evaluation for clinical information extraction

Annotation, development and evaluation for clinical information extraction (2010 - 2014)

Much of the clinical information required for accurate clinical research, active decision support, and broad-coverage surveillance is locked in text files in an electronic medical record (EMR). The only feasible way to leverage this information for translational science is to extract and encode the information using natural language processing (NLP). Over the last two decades, several research groups have developed NLP tools for clinical notes, but a major bottleneck preventing progress in clinical NLP is the lack of standard, annotated data sets for training and evaluating NLP applications. Without these standards, individual NLP applications abound without the ability to train different algorithms on standard annotations, share and integrate NLP modules, or compare performance. We propose to develop standards and infrastructure that can enable technology to extract scientific information from textual medical records, and we propose the research as a collaborative effort involving NLP experts across the U.S.

To accomplish this goal, we will address three specific aims:

    Aim 1: Extend existing standards and develop new consensus standards for annotating clinical text in a way that is interoperable, extensible, and usable.
    Aim 2: Apply existing methods and tools, and develop new methods and tools where necessary for manually annotating a set of publicly available clinical texts in a way that is efficient and accurate.
    Aim 3: Develop a publicly available toolkit for automatically annotating clinical text and perform a shared evaluation to evaluate the toolkit, using evaluation metrics that are multidimensional and flexible.

Selected Publications, Papers, and Presentations

  • Mowery DL, South BR, Velupillai S, Murtola LM, Salanterä S, Suominen H, Christensen L, Leng J, Martinez D, Elhadad N, Pradhan S, Savova G, Chapman WW. Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth 2013 Challenge Task 2. Journal of Biomedical Semantics. 2016.
  • Pradhan S, Elhadad N, Chapman W, Manandhar S, Savova G. SemEval-2014 Task 7: Analysis of Clinical Text. Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 54–62, Dublin, Ireland, August 23-24, 2014.
  • Meystre S, Boonsirisumpun N, Elhadad N, Savova G, Chapman, W. 2014. Poster: Standards-based data model for clinical documents and information in the Shared Annotated Resources (ShARe) project. AMIA Summit on Clinical Research Informatics, San Francisco, CA.
  • Mowery DL, South BR, Murtola LM, Salanterä S, Suominen H, Martinez D, Elhadad N, Pradhan S, Savova G, Chapman WW. Task 2: ShARe/CLEF eHealth evaluation lab 2013. CLEF Proc. Valencia, Spain. 2013.
  • Mowery DL, South BR, Leng J, Murtola LM, Danielsson-Ojala R, Salanterä S, Chapman WW. Creating a reference standard of acronym/abbreviation annotations for the ShARe/CLEF eHealth challenge 2013. AMIA Symp Proc. Washington, DC. 2013.
  • Pradhan S, Elhadad N, South B, Martinez D, Christensen L, Vogel A, Suominen H, Chapman W, Savova G. Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc. 2014.
  • Pradhan S, Elhadad N, South B, Martinez D, Christensen L, Vogel A, Suominen H, Chapman W, Savova G. 2013. Task 1: ShARe/CLEF eHealth Evaluation Lab 2013. Proceedings of the ShARE/CLEF Evaluation Lab 2013.
  • Savova G, Chapman W, Elhadad N. 2012. Shared Annotated Resources for the Clinical Domain. Invited presentation at the Natural Language Processing (NLP) Annotation workshop collocated with the 2nd annual IEEE International Conference on Healthcare Informatics, Imaging and Systems Biology, Sept. 2012. San Diego, CA
  • Chapman WW, Nadkarni PM, Hirchman L, D’Avolio LW, Savova G, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J AM Med Inform Assoc. 2011 Sep-Oct;18(5):540-543.
  • Suominen H; Salantarä S, Velupillai S, Chapman WW, Savova G, Elhadad N, Pradhan S, South BR, Mowery DL, Leveling J, Kelly L, Goeuriot L, Martinez D, Zuccon G. Overview of the ShARe/CLEF eHealth evaluation lab 2013. Springer LNCS
  • Velupillai S, Mowery DL, Christensen L, Elhadad N, Pradhan S, Savova G, Chapman WW. Disease/Disorder Semantic Template Filling -Information Extraction Challenge in the ShARe/CLEF eHealth Evaluation Lab 2014. AMIA Symp Proc. 2014.
  • Liadh K, Goeuriot L, Suominen H, Mowery DL, Velupillai S, Chapman WW, Zuccon G, Palotti J. Overview of the ShARe/CLEF eHealth Evaluation Lab 2014. CLEF Proc. Sheffield, United Kingdom. 2014.
  • Mowery DL, Velupillai S, South BR, Christensen L, Martinez D, Elhadad N, Pradhan S, Savova G, Chapman WW. Task 2: ShARe/CLEF eHealth Evaluation Lab 2014. CLEF Proc. Sheffield, United Kingdom. 2014.
  • Suominen H, Schreck T, Leroy G, Hochheiser HS, Nualart J, Goeuriot L, Kelly L, Mowery DL, Ferraro G, Keim D, Chapman WW, Hensen P. Task 1 of the CLEF eHealth Evaluation Lab 2014: Visual-Interactive Search and Exploration of eHealth Data. CLEF Proc. Sheffield, United Kingdom. 2014.
  • Dublin S, Baldwin E, Walker RL, Christensen LM, Haug PJ, Jackson ML, Nelson JC, Ferraro J, Carrell D, Chapman WW. Natural Language Processing to identify pneumonia from radiology reports. Pharmacoepidemiology and drug safety. 2013 Aug.; 22(8):834-41
  • South BR, Mowery DL, Leng J, Meystre SM, Chapman WW. A system usability study assessing a machine-assisted interactive interface to support annotation of protected health information in clinical texts. AMIA Symp Proc. Washington DC. 2014.
  • South BR, Mowery DL, Suo Y, Ferrández O, Meystre SM, Chapman WW. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. J Biomed Inform. Medical Privacy. 2014.
  • South BR, Mowery DL, Suo Y, Levy E, Ashfaq S, Shevi E, Wang L, Zhang M, Meystre SM, Chapman WW. Ensuring adequate coverage to build a publicly available corpus of de-identified clinical documents. NLP Annotation Workshop. San Diego, CA. 2012.
  • South BR, Mowery DL, Ferrández O, Shen S, Suo Y, Zhang M, Chen A, Wang L, Meystre SM, Chapman WW. On the road towards developing a publicly available corpus of de-identified clinical texts. Annu AMIA Symp. Chicago, IL. 2012
  • Chapman WW, Nadkarni PM, Hirchman L, D’Avolio LW, Savova G, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J AM Med Inform Assoc. 2011 Sep-Oct;18(5):540-543.
  • South BR, Shen S, Leng J, Forbush TB, DuVall SL, Chapman WW. A prototype set to support machine-assisted annotation. Proceedings of the 2012 Workshop on Biomedical Natural Language Processing (BioNLP 2012), pages 130–139.
  • PI: 
    Wendy Chapman
    Noemie Elhadad
    Guergana Savova