You are working on a classification project to identify whet…
You are working on a classification project to identify whether an individual will default on a bank loan or not. The predictors are the characteristics of the credit of the individual such as the credit score, current loan amount, installment amount, number of times payment was late, etc. The training data set contains 15,000 data samples and 10 predictor variables. You notice that 20 samples are missing random predictor variable values. Upon further inspection, you find the following information: 1) the data set is balanced (i.e., it has a similar proportion of both the classes), 2) the maximum number of predictor variable values that are missing for any of the 20 samples is 2, 3) none of the predictor variables are missing values of more than 2 samples, and 4) 11 out of the 20 samples belong to the same class. What is the best way to handle the missing values?
Read Details