Article’s

Fake News Detection Using Machine Learning

Asif Kareem

(04 – 2026)

DOI: 10.5281/zenodo.19385050

 

The proliferation of digitally fabricated information across social media platforms, online news portals, and messaging ecosystems has precipitated a global epistemic crisis with measurable consequences for democratic governance, public health, and socioeconomic stability. The velocity at which misinformation propagates through hyper-connected social networks now far exceeds the capacity of traditional human fact-checking mechanisms, rendering automated, scalable detection systems not merely beneficial but structurally imperative. While substantial academic effort has been directed towards Natural Language Processing (NLP)-based misinformation classifiers, existing systems frequently suffer from an unresolved dichotomy: heavyweight transformer models such as BERT and RoBERTa achieve exceptional classification accuracy but are computationally prohibitive for real-time content moderation at platform scale, whereas lightweight classical machine learning approaches lack the deep semantic reasoning capacity required to detect sophisticated, contextually coherent synthetic narratives. To address these critical limitations, this paper proposes a highly optimized, real-time fake news detection framework that synergizes the deep bidirectional contextual encoding capabilities of a fine-tuned DistilBERT architecture with the rapid, low-parameter classification efficiency of a Gradient Boosting meta-learner. By mathematically compressing transformer attention layers through knowledge distillation and coupling the resulting dense semantic embeddings with a suite of engineered psycholinguistic and stylometric features, the proposed hybrid pipeline is engineered for seamless, low-latency deployment on standard commodity server hardware. Furthermore, this research systematically synthesizes 25 pivotal studies, mapping the evolutionary trajectory of automated deception detection from early rule-based heuristics and classical machine learning classifiers to modern pre-trained language models and multimodal misinformation networks, thereby providing a comprehensive theoretical foundation. Empirical evaluations on a withheld testing partition of the LIAR and FakeNewsNet benchmark datasets demonstrate that the proposed hybrid classifier achieves an exceptional validation accuracy of 96.8% with an F1-score of 0.971. Concurrently, the system maintains an average inference latency of 18 milliseconds per article on standard CPU hardware, decisively satisfying the real-time throughput requirements of production-grade content moderation APIs. Ultimately, this framework provides a scalable, robust, and interpretable blueprint for integrating localized artificial intelligence into permanent, proactive information integrity architectures.

 

 

Scroll to Top