Scale and Standardize Heart Disease Dataset with sklearn
Introduction
The heart is one of the most important part in your body if not the most important one. It is important to take good care of the heart in order to prevent cardiovascular disease. Today, I’m going to explore the use of skelearn different scalers in UCI heart disease dataset in an effort to improve your machine learning performance and coverage level. According to scikit-learn website scalers can have an effect on data with outliers. Many features within a dataset can have different range and characteristics which in-turn can degrade the predictive performance of machine learning algorithms.
The UCI Heart Disease Dataset
The published dataset has 14 features meaning and types can be listed as follows :
- (age) — Age in years
- (sex) — (Male = 1 or Female=0)
- (cp) — (Chest Pain Type=(1,2,3,4) )
- (trestbps) — (Resting Blood Pressure )
- (chol) — (Serum Cholesterol)
- (fbs) — (Fasting Blood Sugar > 120 mg/dl True =1 , False =0)
- (restecg) — (Resting electrocardiographic results = (Normal = 0, ST-T wave abnormality = 1, Probable Left Ventricular Hypertrophy=2