RAG MicroSim: A Hybrid Retrieval-Augmented Generation and Market Micro-Simulation Framework for High Frequency Trading Analysis
Rohan Gaikwad, Sahil Chavan, Kiran Pawar , Amit Lokhande,Om Raut
Standard analytical models are unable to explain the non-linear market dynamics produced by High-Frequency Trading (HFT), which operates in sub-millisecond domains. The most advanced anomaly detectors currently in use, such as Transformers, rely on deep learning and achieve high F1-scores, yet they function as opaque “black boxes” that are unable to reason causally. On the other hand, when used with Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) offers explainability but, due to its inability to retrieve past logs for unusual situations, fundamentally fails during fresh, out-of-distribution market events (such as localised flash crashes). To include a discrete-event market micro-simulator directly into the RAG pipeline, we present RAG-MicroSim, a deterministic hybrid architecture. This approach synthesises mathematically constrained limit order book (LOB) states on demand, avoiding static-corpus restrictions. RAG-MicroSim produces counterfactual “what-if” evidence using the Hawkes Process for stochastic order flow and Order Book Imbalance (OBI) as a rigorous mathematical trigger. The algorithmic depletion of liquidity is successfully reconstructed by the system when tested against the empirical baseline of the 2010 Flash Crash. With an F1-score of 0.94 in anomaly detection and complete causal interpretability, statistical benchmarking demonstrates how RAG-MicroSim unites semantic AI and quantitative physics.

