CoRe-MoE: Contrastive Reweighted Mixture of Experts for Multi-Terrain Humanoid Locomotion with Gait Adaptation

Kailun Huang1,2,†Zikang Xie1,†Yanzhe Xie1,†Panpan Liao2,†Fanghai Zhang1,2,†Yanheng Mai1Yi Gu1 Renjing Xu1,* 1 Hong Kong University of Science and Technology (Guangzhou) | 2 Guangdong University of Technology Equal contribution | * Corresponding author

Abstract

Humans primarily rely on walking and running to traverse complex terrains, without resorting to unnecessarily complex motion patterns. Similarly, humanoid robots should achieve smooth transitions between walking and running while maintaining natural and stable locomotion. However, unifying gait transition and multi-terrain adaptation within a single policy remains challenging due to gradient interference and the distribution shift induced by terrain-dependent visual and dynamic variations. Although Mixture-of-Experts (MoE) architectures can alleviate multi-skill interference, naive joint training often fails to yield clear expert specialization, limiting their effectiveness. To address these challenges, we propose CoRe-MoE, a two-stage reinforcement learning framework that decouples gait generation from terrain adaptation. In the first stage, a stable locomotion policy is learned to produce natural walking and running behaviors with smooth transitions. In the second stage, a terrain-aware MoE branch is introduced and trained with a contrastive objective to shape the gating network, enabling it to capture structured terrain representations and promote expert specialization. The final action is obtained via weighted fusion of the base gait policy and the terrain-aware branch, allowing the policy to preserve stable locomotion patterns while adapting to complex terrains. Extensive simulation results demonstrate that the proposed method outperforms baseline approaches in terms of success rate, locomotion stability, and multi-terrain adaptability. Furthermore, zero-shot deployment on a Unitree G1 humanoid robot validates the effectiveness of our framework, achieving robust walking and running across stairs, slopes, steps, obstacles, and unstructured outdoor terrains, while maintaining accurate foothold placement and dynamic stability under external disturbances.

Abstract figure

Overview of our framework. In Stage 1, we learn a flat-terrain adaptive gait policy using MoE and AMP, enabling stable walking-running transitions. In Stage 2, we incorporate depth perception and employ MoE with contrastive learning to acquire terrain-aware control. During inference, the flat-terrain policy provides a gait prior and the multi-terrain policy applies terrain adjustments; their weighted fusion produces the final action, achieving natural gait and multi-terrain adaptability.

BibTeX

@article{cheng2023parkour,
title={Extreme Parkour with Legged Robots},
author={Cheng, Xuxin and Shi, Kexin and Agarwal, Ananye and Pathak, Deepak},
journal={arXiv preprint arXiv:2309.14341},
year={2023}
}