Towards VM Rescheduling Optimization Through Deep Reinforcement Learning

1: University of California, Merced; 2: University of California, Berkeley; 3: ByteDance
EuroSys 2025

*Both authors contributed equally to this research.
Two-Stage Model VM
Two-Stage Model PM
The first stage of VMR2L processes all VMs and PMs via shared embedding networks, based on which the VM actor selects a VM to be rescheduled. Once a candidate VM is selected by the VM actor, VMR2L masks out all the PMs that cannot host the candidate VM. The PM actor only accesses the selected VM, and then selects a destination PM from the unmasked PMs.

Abstract

Modern industry-scale data centers need to manage a large number of virtual machines (VMs). Due to the continual creation and release of VMs, many small resource fragments are scattered across physical machines (PMs). To handle these fragments, data centers periodically reschedule some VMs to alternative PMs, a practice commonly referred to as VM rescheduling. Despite the increasing importance of VM rescheduling as data centers grow in size, the problem remains understudied. We first show that, unlike most combinatorial optimization tasks, the inference time of VM rescheduling algorithms significantly influences their performance, due to dynamic VM state changes during this period. This causes existing methods to scale poorly. Therefore, we develop a reinforcement learning system for VM rescheduling, VMR2L, which incorporates a set of customized techniques, such as a two-stage framework that accommodates diverse constraints and workload conditions, a feature extraction module that captures relational information specific to rescheduling, as well as a risk-seeking evaluation enabling users to optimize the trade-off between latency and accuracy. We conduct extensive experiments with data from an industry-scale data center. Our results show that VMR2L can achieve a performance comparable to the optimal solution but with a running time of seconds.

Background

Model Details

Two-Stage Agent (VM)
Two-Stage Agent (PM)

1) Two-Stage Framework

Sparse Attention

2) Feature Extraction with Sparse-Attention

3) Risk-Seeking Evaluation

Experiments

Poster

BibTeX


@inproceedings{ding2025towards,
  title={Towards VM Rescheduling Optimization Through Deep Reinforcement Learning},
  author={Ding, Xianzhong and Zhang, Yunkai and Chen, Binbin and Ying, Donghao and Zhang, Tieying and Chen, Jianjun and Zhang, Lei and Cerpa, Alberto and Du, Wan},
  booktitle={Proceedings of the Twentieth European Conference on Computer Systems},
  year={2025}
}

@misc{ding2023vmr2l,
  title={Vmr2l: Virtual machines rescheduling using reinforcement learning in data centers},
  author={Ding, Xianzhong and Zhang, Yunkai and Chen, Binbin and Ying, Donghao and Zhang, Tieying and Chen, Jianjun and Zhang, Lei and Cerpa, Alberto and Du, Wan},
  year={2023}
}