ENABLE TRAINING MODULES |

Home / ENABLE TRAINING MODULES

Online Module Form

The Carolina Health Informatics Program (CHIP) has developed a few online training modules called An Introduction to Data Science through a health care lens to expose learners to the field of data science. These online modules are accessible to anyone who is interested, and require no prior training or knowledge in data science. If you complete the entire set of modules – the entire “short course” – and successfully pass a simple final assessment, you will receive a certificate of completion.

Introduction to Data Science Curriculum

Text Mining

Data Mining

Module 1: Text Preprocessing - adgriff

Text Preprocessing is an important step for natural language processing (NLP). It transforms text into a more digestible form so that machine learning algorithms can perform better. This module will teach various text preprocessing techniques.

Module 1: Text Preprocessing - dfwhite

Text Preprocessing is an important step for natural language processing (NLP). It transforms text into a more digestible form so that machine learning algorithms can perform better. This module will teach various text preprocessing techniques.

Module 2: Exploratory Analysis of Text Data - adgriff

Exploratory analysis is an initial approach to analyzing data sets. It commonly involves summarizing the main characteristics of datasets their main characteristics and data visualizations. This module will teach you how to perform exploratory analysis for text data.

Note: If you encounter an error in the optional section, please copy and paste the below code into the code cell with the error.

Module 2: Exploratory Analysis of Text Data - dfwhite

Exploratory analysis is an initial approach to analyzing data sets. It commonly involves summarizing the main characteristics of datasets their main characteristics and data visualizations. This module will teach you how to perform exploratory analysis for text data.

Note: If you encounter an error in the optional section, please copy and paste the below code into the code cell with the error.

Module 3: Information Extraction - adgriff

Text data is often rich with both information and meaning. However, text data is also often complex which can make analysis difficulty. This module will introduce you to parts of speech tagging, named entity recognition, and relation extraction. This will allow you to both understand the structure of your textual data and derive meaning from it.

Module 3: Information Extraction - dfwhite

Text data is often rich with both information and meaning. However, text data is also often complex which can make analysis difficulty. This module will introduce you to parts of speech tagging, named entity recognition, and relation extraction. This will allow you to both understand the structure of your textual data and derive meaning from it.

Module 4: Feature Representation for Text - adgriff

Feature representation is a way to present your data so a machine or computer can understand it and perform an analysis. This module will investigate feature representation for text data. You will also explore generating different types of feature representations and comparing how well they perform.

Module 4: Feature Representation for Text - dfwhite

Feature representation is a way to present your data so a machine or computer can understand it and perform an analysis. This module will investigate feature representation for text data. You will also explore generating different types of feature representations and comparing how well they perform.

Module 5: Predictive Analysis of Text Data - adgriff

One of the most powerful uses of data is using it to make future predictions. In this module, we will be exploring how to use text data to perform predictions. Specifically, you will learn about two common machine learning algorithms, logistic regression and k-nearest neighbor.

Module 5: Predictive Analysis of Text Data - dfwhite

One of the most powerful uses of data is using it to make future predictions. In this module, we will be exploring how to use text data to perform predictions. Specifically, you will learn about two common machine learning algorithms, logistic regression and k-nearest neighbor.

Module 1: Preparing Data - adgriff

Preparing data is an important step in any data mining project. In this module you will learn how to upload a CSV file and how to deal with missing or improbable data.

Module 1: Preparing Data - dfwhite

Preparing data is an important step in any data mining project. In this module you will learn how to upload a CSV file and how to deal with missing or improbable data.

Module 2: Univariate Analysis - adgriff

Univariate analysis allows you to deeply analyze a single variable. This module will teach you the skills to perform univariate analysis including variable types, summary statistics, and univariate data visualization. Along the way, you’ll learn by analyzing specific variables from real patient data!

Module 2: Univariate Analysis - dfwhite

Univariate analysis allows you to deeply analyze a single variable. This module will teach you the skills to perform univariate analysis including variable types, summary statistics, and univariate data visualization. Along the way, you’ll learn by analyzing specific variables from real patient data!

Module 3: Bivariate Analysis - adgriff

Bivariate analysis is a statistical method which helps us see how our variable relate to one another. In this module, you’ll learn different bivariate analysis techniques and how to apply those techniques in R.

Module 3: Bivariate Analysis - dfwhite

Bivariate analysis is a statistical method which helps us see how our variable relate to one another. In this module, you’ll learn different bivariate analysis techniques and how to apply those techniques in R.

Module 4: Feature Selection - adgriff

Feature selection is the process of selecting a subset of variables for the purpose of building a machine learning model. Reducing the number of features can improve model performance, make models more easily understandable, and reduces the time required to run a model. In this module you will learn filter, wrapper, and embedded feature selection methods.

Module 4: Feature Selection - dfwhite

Feature selection is the process of selecting a subset of variables for the purpose of building a machine learning model. Reducing the number of features can improve model performance, make models more easily understandable, and reduces the time required to run a model. In this module you will learn filter, wrapper, and embedded feature selection methods.

Module 5: Predictive Analysis - adgriff

Predictive analysis is a powerful tool which allows us to make future predictions from data. This module will pull together the previous four data mining modules to teach advanced techniques such as machine learning, logistic regression, and decision trees. Along the way, you’ll learn by predicting mortality from real ICU patient data!

Module 5: Predictive Analysis - dfwhite

Predictive analysis is a powerful tool which allows us to make future predictions from data. This module will pull together the previous four data mining modules to teach advanced techniques such as machine learning, logistic regression, and decision trees. Along the way, you’ll learn by predicting mortality from real ICU patient data!