Towards VM Rescheduling Optimization Through Deep Reinforcement Learning

The first stage of VMR²L processes all VMs and PMs via shared embedding networks, based on which the VM actor selects a VM to be rescheduled. Once a candidate VM is selected by the VM actor, VMR²L masks out all the PMs that cannot host the candidate VM. The PM actor only accesses the selected VM, and then selects a destination PM from the unmasked PMs.

Abstract

Modern industry-scale data centers need to manage a large number of virtual machines (VMs). Due to the continual creation and release of VMs, many small resource fragments are scattered across physical machines (PMs). To handle these fragments, data centers periodically reschedule some VMs to alternative PMs, a practice commonly referred to as VM rescheduling. Despite the increasing importance of VM rescheduling as data centers grow in size, the problem remains understudied. We first show that, unlike most combinatorial optimization tasks, the inference time of VM rescheduling algorithms significantly influences their performance, due to dynamic VM state changes during this period. This causes existing methods to scale poorly. Therefore, we develop a reinforcement learning system for VM rescheduling, VMR²L, which incorporates a set of customized techniques, such as a two-stage framework that accommodates diverse constraints and workload conditions, a feature extraction module that captures relational information specific to rescheduling, as well as a risk-seeking evaluation enabling users to optimize the trade-off between latency and accuracy. We conduct extensive experiments with data from an industry-scale data center. Our results show that VMR²L can achieve a performance comparable to the optimal solution but with a running time of seconds.

Background

1) What is VM Scheduling and Rescheduling?

VM scheduling decides which PM to host the incoming VM when a new request arrives.

VM rescheduling reassigns an already deployed VM to a new destination PM.

The above figure shows the maximum number of VMs changes per minute averaged over a 30-day period of a cluster from our in-house data center.
2) VM Rescheduling Details
- Each cluster has no more than a few hundreds PMs, enabling tailored configuration and fault isolation.
- Rescheduling leverages hot migration, requiring low overhead.
- It is primarily applied to clusters hosting EC2 instances running for hours or days.
- A migration number limit (MNL) ensures stability, typically at 2-3% of VMs.
3) Motivation: VM Rescheduling Latency Requirements

The Fragment Rate (FR) of MIP outperforms HA under varying MNLs, with MIP guaranteeing near-optimal solutions but requiring exponential computation time (1.78 to 50.55 minutes). A 5-second inference limit ensures feasibility in dynamic data centers.

The MIP solution remains near-optimal if computed within five seconds, as indicated by the "elbow point." Beyond this, FR reduction diminishes rapidly, necessitating a strict five-second limit for inference to maintain effectiveness.
4) Rescheduling Data Characteristics

Key statistics: 81.44% of PMs have CPU utilization over 80%, and 88.27% of VM requests are for 16 or fewer cores.
5) Why Reinforcement Learning (RL)?

RL extracts features automatically, generalizes to new scenarios, and interacts cheaply with the environment. This avoids the limitations of supervised and heuristic models.

Model Details

1) Two-Stage Framework

A VM request starts an episode. The action at each step in the episode is to migrate one VM. For example, if the model is allowed to reschedule 50 VMs, we let the model take 50 actions. At each time step, the action is a two-tuple -- the VM to be rescheduled and its new destination PM. We design a two-stage agent, where the model chooses the VM in the first stage and then selects a destination PM in the second stage.
First stage: a VM actor embeds all VMs and PMs via two shared embedding networks, based on which it selects a candidate VM.
Second stage: once a candidate VM is selected, VMR²L can efficiently mask out all the PMs that cannot host the candidate VM. The PM actor only has access to the embedding of the selected VM, instead of all VMs. PM actor then selects an appropriate destination PM from the remaining PMs.
Benefit: once a VM is selected, we can mask out all the illegal PMs to satisfy various service constraints (e.g., resource availability, affinity level).

2) Feature Extraction with Sparse-Attention

We use an attention-based model as the backbone as its number of learnable parameters is independent of the number of VMs and PMs. We design a tree-level sparse-attention module to allow each VM/PM to exchange information with other machines under the same PM. This affiliation information is critical for VM rescheduling but is absent in vanilla attention.

3) Risk-Seeking Evaluation

Since we can exactly simulate the effect of VMR actions, we can sample multiple trajectories during inference time, and only deploy the one with the highest reward. Actions with low probabilities are likely to be suboptimal, and we should avoid them during inference!

Experiments

1) Latency Constraint

VMR²L beats all baselines under the five-second latency constraint.
2) Ablation Studies
3) Two-Stage Design

Two-stage design allows VMR²L to handle different constraints.
4) Generalization to Different Objectives

VMR²L can generalize to different objectives, such as reaching an FR level with the minimal number of VM migrations.
5) Abnormal Workload Levels

VMR²L can generalize to abnormal workload levels. Even when some workload levels are not present during training, as long as we have trained on a higher workload (or preferably a lower one too), we can cover those gaps!
6) Policy Visualization

A visualization of the learned rescheduling policy. It can be generated using eval_plot_steps.py.

BibTeX


@inproceedings{ding2025towards,
  title={Towards VM Rescheduling Optimization Through Deep Reinforcement Learning},
  author={Ding, Xianzhong and Zhang, Yunkai and Chen, Binbin and Ying, Donghao and Zhang, Tieying and Chen, Jianjun and Zhang, Lei and Cerpa, Alberto and Du, Wan},
  booktitle={Proceedings of the Twentieth European Conference on Computer Systems},
  year={2025}
}

@misc{ding2023vmr2l,
  title={Vmr2l: Virtual machines rescheduling using reinforcement learning in data centers},
  author={Ding, Xianzhong and Zhang, Yunkai and Chen, Binbin and Ying, Donghao and Zhang, Tieying and Chen, Jianjun and Zhang, Lei and Cerpa, Alberto and Du, Wan},
  year={2023}
}

Towards VM Rescheduling Optimization Through Deep Reinforcement Learning

Abstract

Background

1) What is VM Scheduling and Rescheduling?

2) VM Rescheduling Details

3) Motivation: VM Rescheduling Latency Requirements

4) Rescheduling Data Characteristics

5) Why Reinforcement Learning (RL)?

Model Details

1) Two-Stage Framework

2) Feature Extraction with Sparse-Attention

3) Risk-Seeking Evaluation

Experiments

1) Latency Constraint

2) Ablation Studies

3) Two-Stage Design

4) Generalization to Different Objectives

5) Abnormal Workload Levels

6) Policy Visualization

Poster

BibTeX