Article’s

ADAPTIVE MULTI-OBJECTIVE REWARD SHAPING FOR DEEP REINFORCEMENT LEARNING-BASED TRAFFIC SIGNAL CONTROL: THE AMRS-DUELINGDDQN FRAMEWORK

Rishu Raj

(06 – 2026)

DOI:

 

Urban traffic congestion remains one of the biggest challenges for modern transportation systems. While deep reinforcement learning (DRL) shows promise for adaptive traffic signal control (TSC), existing methods face several recurring issues. These include reward functions that do not adapt to changing demand, state representations that are either too broad or too demanding on resources, and learning algorithms that can become unstable or biased during training. This paper introduces AMRS-Dueling’d, an Adaptive Multi-objective Reward Shaping framework combined with a Dueling Double Deep Q-Network architecture, developed specifically to tackle these issues. Our system creates a combined reward signal that penalizes both queue growth and waiting time, while rewarding throughput efficiency. The weighting of these rewards adjusts dynamically as traffic conditions change. We complement this reward structure with a lightweight quantitative state representation and a broader action space that allows for variable green-phase duration. Experiments conducted in VISSIM across four traffic demand levels (light, moderate, heavy, and congested) show that AMRS-Dueling’d reduces average vehicle delay by up to 38% compared to fixed-time control, and outperforms actuated control by 22% under high-demand conditions. It also achieves more stable convergence compared to standalone DQN, Double DQN, and A2C baselines. Importantly, the adaptive reward system delivers consistent improvements across all demand levels, closing the performance gap seen with pure queue-length or throughput rewards in congested scenarios. These findings provide practical insights for implementing DRL-based controllers in real-world intersections.

 

 

Scroll to Top