Session ID: 

Impact of Data Balancing Techniques on ML Model Performance on the prediction of 30-Day Readmission

9:30am - 3:30pm Wednesday, March 11
Orlando - Hyatt Regency
Hyatt - Regency Ballroom U


This poster will be displayed in conjunction with the HIMSS20 Career Expo. Posters will be available for viewing through the entirety of the event. Poster presenters will be available for questions from 12:00 pm -1:00 pm.

Most datasets are highly skewed and suffer from class imbalance. Typically, positive class (e.g. diagnoses, mortality, fraud) constitutes a very small minority of the predicted feature. Consequently, the trained ML model using such data are going to be highly biased toward the majority class, perform poorly and exhibit high false negative rate. Imbalanced data prevail in most of the fields including Healthcare. Various techniques have been developed to balance the classes. However, in the healthcare informatics paradigm, little attention has been drawn to compare the performances of these techniques while developing a novel model, such as 30 day hospital readmission prediction model. This research aims to compare the ML models performances of six different balancing techniques for a 30-day hospital readmission prediction task.


Learning Objectives

  • Understand the impact/contribution of various data balancing techniques on ML performance
  • Analyse at least five to seven data balancing techniques and learn how to identify the most optimal balancing technique that best suites the need of the researcher
  • Evaluate the balancing techniques on various types of data structure (e.g. structured vs. unstructured data)


Doctoral Candidate,
George Mason University


Early Careerist
IT Professional