Zero-order trajectory optimization
Optimal control is a way to program robots by defining the tasks to be achieved in terms of quantitative objectives such as cost, reward or constraint functions, rather than by explicitly programming the motion by sequences or demonstrations. Among the properties that characterize the methods used to solve it, the most important are often whether the decision variable that is optimized is the robot trajectory or its policy; and whether this optimization uses derivatives or not. While trajectory optimization often benefits from the derivatives, it also makes it less robust in particular when considering irregular problems such as movements with contacts or with integer decision. In this project, it is proposed to consider gradient-free algorithms for optimizing the robot motion, in particular to compare it to gradient- based optimization, to seek for guarantees in the convergence (such as rate of convergence or convergence domain), to transfer some algorithmic progress developed in gradient-based trajectory optimization, and to understand the importance or limitations of not using the gradient in reinforcement learning.