Generating low-level robot controllers often requires manual parameters tuning and significant system knowledge, which can result in long design times for highly specialized controllers. Moreover, with micro-robots the dynamics change on each design iteration and there is little experimental time to tune controllers. To address the problem of rapidly generating low-level controllers without domain knowledge, we propose using model-based reinforcement learning (MBRL) trained on few minutes of automatically generated data. Initial results showed the capabilities of MBRL on a Crazyflie quadrotor to achieve stable hovering of over 6 seconds on less than 5 minutes of training data, using only on-board sensors, direct motor input signals, and no initial dynamics knowledge. The goal is to apply the most data efficient and stable methods to microrobots (hexapod, ionocraft, jumper) to accomplish simple tasks such as walking and flying in as little as minutes of wall time. The general nature of the model-based RL approach opens up the question: with a trained dynamics model on finite, experimental data, what is the best way to generate a control policy. For all methods presented, the foundation is a forward dynamics model predicting the next state given the current state and action. We explore a variety of methods to generate a control policy: including pairing the dynamics model with a model predictive controller (MPC), a neural network policy imitating the MPC, deterministic particle based policy gradients, and zeroth order optimizers such as traditional policy gradient algorithms estimating return.
March 14, 2022
BSAC Project Materials (Current)
PREPUBLICATION DATA - ©University of California 2022