A Critical Assessment of Balanced Class Distribution Problems: The Case of Predict Student Dropout
The general objective of this study is to help universities find the most influence factors which causes students drop out. The specific objective is to find the precise algorithm to predict dropout student in balanced class distribution case. Dataset was obtained from academic information system of a University in East Java, Indonesia. Data taken between 2009-2015 consists of 32 attributes, 425 data, and 2 classes. Type of data attributes are nominal and numerical. The results of this study state that the most influence factors which causes students to drop out are lecture programme; number of courses; credit amount in semester 3; credit amount in semester 6; credit amount in semester 9; Grade Point Average (GPA) in semester 2; GPA in semester 3; GPA in semester 4; and GPA in semester 6. Random Forest algorithm with gain ratio criteria parameter and shuffled sample method has the best performance, namely 99.29%, 99.47%, 9.09%, 99.28%, 0.71%, and 0.999 for accuracy, precision, recall, f-measure, classification error, and Area Under Curve (AUC), respectively. While the worst performance algorithm is Decision Tree with linear sampling method and information gain criteria, namely 83.19%, 83.47%, 86.32%, 84.87%, 16.81%, and AUC 0.3 for accuracy, precision, recall, f-measure, classification error, and AUC, respectively.