Cybersecurity Threat Detection Using Machine Learning: A Comparative Analysis of Gradient Boosting Approaches on Network Intrusion Data
Md Saad Bin Rizvi
Due to emerging attacks, signature-based threat detection systems are no longer efficient for current attack strategies. In this research paper, we will propose an end-to-end machine learning solution based on two state-of-the-art gradient boosting methods (XGBoost & LightGBM), which can classify network connection into five categories of threat attacks. This study uses the dataset provided by the KDD Cup 1999 challenge, which consists of 494,021 network connection samples labeled with threats. To train an accurate model, we implement a strict data preprocessing procedure, which involves eliminating duplicates, performing one-hot encoding, class label aggregation, and applying min-max normalization. Experimentation reveals that our model achieves the maximum accuracy level of 99.2% using XGBoost and 99.0% using LightGBM, compared to the baseline models of Decision Tree (97.8%), Naive Bayes (88.4%), K-Nearest Neighbors (96.1%), and Random Forest (98.5%). It turns out that our model works great when identifying high-frequency attacks (Denial-of-Service, Probe) but does not perform well enough for detecting minority attack classes (R2L, U2R).

