Data Files
The files below represent data files used throughout the jupyter notebooks for “Methods in Medical Informatics”. All data files are organized by chapter.
- sample.txt – this file contains the article “A machine learning algorithm to increase COVID-19 inpatient diagnostic capacity” represented in XML (Source)
- Test_directory – a directory containing two separate text files
- mim2gene.txt – this is a text file that details the links between the genes in OMIM and other gene identifiers
- d2020.bin – a binary file which list current Medical Subject Headings (MeSH) as of 2020
- sample.bin – a binary file which contain a single example string
- us.gif – an image file which contains an image of the United States
- neo1.jpg – a JPEG file displaying a diagram of different neoplasm subtypes
- loc_states.txt – a text file which contains the longitude and latitude for the geographic centers of all 50 states
- d2020.bin – a binary file which contains tens of thousands of MeSH terms
- stop.txt – a text file with a list of stopwords
- titles.txt – a text file that contains a list of 100 titles of journal articles
- cancer_gene_titles.txt – a text file which contains a list of cancer-related journal article titles
- text.txt – a text file that contains a sample of a journal article
- paradise.txt – the novel paradise lost in text format
- treasure.txt – the novel treasure island in text format
- d2020.bin – a binary file which contains tens of thousands of MeSH terms
- each10.txt – this is a text file which contains an electronic version of the ICD
- icdo3.txt – this is a text file which contains an electronic version of the ICD-O
- cancer_citations.txt – a text file that lists cancer-related journal articles
- cancer_gene_titles.txt – a text file which contains a list of cancer-related journal article titles
- cancer_gene_titles.txt – a text file which contains a list of cancer-related journal article titles
- neocl.xml – the Neoplasm Classification nomenclature formatted using XML
- neocl.lst – this file contains a list of candidate neoplasm classification terms
- neocl.xml – the Neoplasm Classification nomenclature formatted using XML
- doublets.txt – this is a text file containing numerous medical term doublets