UMLSmapper finds biomedical concepts in Spanish and English clinical texts. Specifically, it finds concepts included in the UMLS Metathesaurus1, which is a huge collection of biomedical terminologies of reference, such as SNOMED CT®2 and the Medical Subject Headings3 (MeSH). UMLSmapper is highly configurable: users can choose the types of concepts the tool must find, as well as the terminologies that it must use. Furthermore, the different components that make the UMLSmapper pipeline can also be configured.

UMLSmapper has proven to be robust across text genres (i.e., physician notes, clinical cases, scientific articles), and to perform in line with long existing and robust tools for English, such as MetaMap. An evaluation of UMLSmapper against the Mantra Gold Standard Corpus4 has yielded the following results:

Micro-averaged scores
Task Language Precision Recall F1-score
Recognition English 0.72 0.73 0.73
Classification English 0.68 0.68 0.68
Identification English 0.67 0.68 0.67
Recognition Spanish 0.70 0.69 0.70
Classification Spanish 0.67 0.67 0.67
Identification Spanish 0.63 0.62 0.63

UMLSmapper is developed at Vicomtech in collaboration with the IXA research group of the University of the Basque Country (UPVH/EHU). The work has been partially financed by the Department of Economic Development and Infrastructure of the Basque Government under the project BERBAOLA (KK-2017/00043), and by the Spanish Ministry of Economy and Competitiveness (MINECO/FEDER, UE) under the projects CROSSTEXT (TIN2015-72646-EXP) and TUNER (TIN2015-65308-C5-1-R).


Related publications

N. Perez, P. Accuosto, À. Bravo, M. Cuadros, E. Martínez-Garcia, H. Saggion, G. Rigau, Cross-lingual semantic annotation of biomedical literature: experiments in Spanish and English, Bioinformatics, 2019. [link] [bib]

N. Perez, M. Cuadros, G. Rigau, Biomedical term normalization of EHRs with UMLS, in: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018, pag. 2045-2051. [pdf] [bib]

M. Cuadros, N. Perez, I. Montoya, A. García Pablos, Vicomtech at BARR2: Detecting Biomedical Abbreviations with ML Methods and Dictionary-based Heuristics, in: Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) co-localted with the 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2018), 2018, pag. 322-328. [pdf] [bib]

N. Perez, Mapping of Electronic Health Records in Spanish to the Unified Medical Language System Metathesaurus, Master's thesis, University of the Basque Country (UPV/EHU), 2017. [pdf] [bib]