BPN915: Control of Microrobots with Reinforcement Learning

Abstract: 

Developing task schedulers and low-level end-to-end controllers for microrobots operating in complex environments often demands extensive system and environment knowledge, leading to prolonged design cycles for specialized controllers. To expedite the generation of general controllers without requiring domain-specific expertise, we propose utilizing model-based reinforcement learning (MBRL) trained within simulated environments. Our research advances microrobot control through two key approaches: modeling the long-term dynamics of robots and distilling computationally intensive model predictive control (MPC) into reactive neural network policies. Accurate long-term control is achieved by reframing proprietary state-action data, thereby minimizing online computation and replanning requirements. MPC distillation involves extensive offline planning, with a focus on prioritizing states that contribute to goal attainment, rather than uniformly weighting all past actions, including erroneous ones. Previous results demonstrate that controllers emerging from simulated environments successfully adapt to varying terrains while tracking input commands using only motor input and a two-axis accelerometer on both flat surfaces and step-response obstacles. The controller has been successfully deployed on a scaled-up robot platform via domain randomization. For real-time implementation on microrobots, specifically the Single Chip microMote (SCuM, BPN803), a smaller, more efficient controller with reduced computational and memory demands is essential. We have trained a separate transformer model using the controller data, resulting in a network with 60% less memory usage while retaining 80% of the original network’s accuracy. Our current efforts focus on verifying the controller for quadrupedal microrobots through both sim-to-sim and sim-to-real approaches, using the scaled-up robot platform. These efforts include tasks such as walking with different gaits, balancing over varied terrain, and recovering from disturbances. Our ultimate goal is to develop an end-to-end controller capable of sensing and controlling quadrupedal microrobots to accomplish tasks using onboard processing capabilities.


We have been making progress on hyper data-efficient reinforcement learning along two thrusts: modeling long-term dynamics of robots and distilling compute heavy model predict control (MPC) into a reactive neural network policy. The accurate long-term predictions are done be reframing state-action data to include time dependance across trajectories (useful because less online compute and replanning is needed when able to reason into the future). The MPC distillation is done by planning offline extensively and re-weighting the controller to prioritize states that lead to the goal, rather than equally weighting all previously chosen actions, even if they were erroneous.

Research currently funded by: Member Fees

Author: 
Kesava Viswanadha
Zhongyu Li
Emily Tan
Nelson Lojo
Derrick Han Sun
Aviral Mishra
Rushil Desai
Publication date: 
August 12, 2024
Publication type: 
BSAC Project Materials (Current)
Citation: 
PREPUBLICATION DATA - ©University of California 2024

*Only registered BSAC Industrial Members may view project materials & publications. Click here to request member-only access.