RL-BES: optimizing strategies using reinforcement learning for blockchain economic security

Erdeng Wen; Zhuotao Deng; Yifan Mo; Yuren Zhou; Xiaohong Shi

doi:10.55092/blockchain20250005

Article

Open Access

RL-BES: optimizing strategies using reinforcement learning for blockchain economic security

Erdeng Wen¹Zhuotao Deng²Yifan Mo³Yuren Zhou⁴Xiaohong Shi⁵^,∗

¹ School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China

² Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China

³ School of Software Engineering, Sun Yat-sen University, Zhuhai, China

⁴ School of Software Engineering, Sun Yat-Sen University, Zhuhai, China

⁵ School of Information Engineering, Guangzhou Panyu Polytechnic, Guangzhou, China

* shixh@gzpyp.edu.cn

Volume
Volume 3 Issue 1, 2025
Citation
Wen E, Deng Z, Mo Y, Zhou Y, Shi X. RL-BES: optimizing strategies using reinforcement learning for blockchain economic security. Blockchain 2025(1):0005, https://doi.org/10.55092/blockchain20250005.
DOI
10.55092/blockchain20250005
Copyright
Copyright2025 by the authors. Published by ELSP.
Special Issue
Blockchain Infrastructure for Web3: Technologies, Tools and Applications

Abstract

The rapid growth of decentralized finance (DeFi) has provided numerous benefits, but it has also presented significant economic security challenges. One of the most critical issues is Maximum Extractable Value (MEV). MEV refers to the opportunities for miners or validators to earn additional profits by altering the order of transactions. However, current MEV detection methods have notable limitations. These include poor adaptability of algorithms, the vastness of the search space, and the inefficiency of methods that rely on traditional heuristic approaches. To overcome these challenges, we introduces a reinforcement learning-based MEV optimization system for blockchain—RL-BES (Reinforcement Learning for Blockchain Economic Security). This system employs two deep reinforcement learning networks to optimize transaction ordering and template parameters, integrated with Monte Carlo Tree Search (MCTS) for effective path exploration. Furthermore, we presents a custom model evaluation tool designed to adjust various networks and parameters, facilitating the analysis of the best algorithmic solutions for on-chain MEV extraction. Experimental results indicate that the RL-BES system excels in multiple DeFi applications. It demonstrates faster convergence and consistently surpasses the performance of Flashbot and other similar detection tools.

Keywords

blockchain; reinforcement learning; Maximum Extractable Value (MEV)

Preview

view pdf