Download PDFOpen PDF in browser

Comparative Analysis of XGBoost and Random Forest in Predicting the Success of Trainees in the Technical Intern Training Program (TITP)

EasyChair Preprint 15876

6 pagesDate: March 1, 2025

Abstract

Japan's population is projected to decline from 125 million (2020) to 88 million (2065), prompting the government to open opportunities for foreign workers through the Technical Intern Training Program (TITP). This research aims to analyze success factors of TITP interns using XGBoost and Random Forest models with SMOTE method to handle imbalanced data. Data was sourced from labor sending companies comprising 784 samples with the following distribution: 57 cases of pre-training dropouts, 67 cases of training dropouts, 52 cases of internship dropouts, 16 cases of runaways, and 592 cases of successful internship completion. Results show Random Forest slightly outperforming XGBoost with balanced accuracy of 0.32 compared to 0.27, though both achieved macro F1-scores of approximately 0.28-0.32. Feature importance analysis revealed age, test scores, and health factors as key predictors of internship success. The main challenge was extreme class imbalance with minority classes such as runaways (only 16 samples or 2% of the total). While the models performed well for the majority class, improvements are needed for minority class detection.

Keyphrases: Imbalanced data, Multiclass Classification, Random Forest, SMOTE, Technical Intern Training Program, XGBoost

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:15876,
  author    = {Syaban Maulana and Nenden Siti Fatonah and Gerry Firmansyah and Agung Mulyo Widodo},
  title     = {Comparative Analysis of XGBoost and Random Forest in Predicting the Success of Trainees in the Technical Intern Training Program (TITP)},
  howpublished = {EasyChair Preprint 15876},
  year      = {EasyChair, 2025}}
Download PDFOpen PDF in browser