Backprop-MPDM: Faster risk-aware policy evaluation through efficient gradient optimization

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2018

PDF thumbnail
(PDF, 1.5 MB )

Abstract

In Multi-Policy Decision-Making (MPDM), many computationally-expensive forward simulations are performed in order to predict the performance of a set of candidate policies. In risk-aware formulations of MPDM, only the worst outcomes affect the decision making process, and efficiently finding these influential outcomes becomes the core challenge. Recently, stochastic gradient optimization algorithms, using a heuristic function, were shown to be significantly superior to random sampling.

In this paper, we show that accurate gradients can be computed – even through a complex forward simulation – using approaches similar to those in deep networks. We show that our proposed approach finds influential outcomes more reliably, and is faster than earlier methods, allowing us to evaluate more policies while simultaneously eliminating the need to design an easily-differentiable heuristic function. We demonstrate significant performance improvements in simulation as well as on a real robot platform navigating a highly dynamic environment.

Media

Backprop-MPDM: Faster risk-aware policy evaluation through efficient gradient optimization
Movie 156.8 MB


bibtex

@inproceedings{mehta2018icra,
    AUTHOR     = {Dhanvin Mehta and Gonzalo Ferrer and Edwin Olson},
    TITLE      = {{Backprop-MPDM}: Faster risk-aware policy evaluation through
                 efficient gradient optimization},
    MONTH      = {May},
    YEAR       = {2018},
    BOOKTITLE  = {Proceedings of the {IEEE} International Conference on Robotics and
                 Automation ({ICRA})},
}