ESA GNC Conference Papers Repository
Deep Reinforcement Learning based Integrated Guidance and Control for a Launcher Landing Problem
Deep Reinforcement Learning (Deep-RL) has received considerable attention in recent years due to its ability to make an agent learn how to take optimal control actions, given rich observation data via the maximization of a reward function. Future space missions will need new on-board autonomy capabilities with increasingly complex requirements at the limits of the vehicle performance. This justifies the use of machine learning based techniques, in particular reinforcement learning in order to allow explore the edge of the performance trade-off space. The guidance and control systems development for Reusable Launch Vehicles (RLV) can take advantage of reinforcement learning techniques for optimal adaption in the face of multi-objective requirements and uncertain scenarios. In this work, a Deep-RL algorithm is used to train an actor-critic agent to simultaneously control the engine thrust magnitude and the two TVC gimbal angles to land a RLV in 6-DoF simulation. The design followed an incremental approach, progressively augmenting the number of degrees of freedom and introducing more complexity factors such as non-linearity in models. Ultimately, the full 6-DoF problem was addressed using a high fidelity simulator that includes a nonlinear actuator model and a realistic vehicle aerodynamic model. Starting from an initial vehicle state along a reentry trajectory, the problem consists of precisely land the RLV while ensuring system requirements satisfaction, such as saturation and rate limits in the actuation, and aiming at fuel consumption optimality. The Deep Deterministic Policy Gradient (DDPG) algorithm was adopted as candidate strategy to allow the design of an integrated guidance and control algorithm in continuous action and observation spaces. The 1-DoF and 2-DoF scenarios considered allowed to perform hyperparameter sensitivity analyses, as well as to better understand how to shape the reward function in dealing with the performance trade-off space. Moreover, the inclusion of the actuation model, which also considers rate limiters, was shown to considerably hinder the design. This challenge motivated the redefinition of the observation state and the type of output that the Neural Network (NN) agent provides. The results obtained are very satisfactory in terms of landing accuracy and fuel consumption. These results were also compared to a more classical and industrially used solution, due to its capability to yield satisfactory landing accuracy and fuel consumption, composed of a successive convexification guidance and a PID controller tuned independently for the non-disturbed nominal scenario. This comparison led to the conclusion that the Deep-RL yields, for this benchmark, a better landing position accuracy. A 1000-shot Monte Carlo (MC) campaign was also performed, leading to a 97% success rate in terms of requirements satisfaction, for a scenario with wind effects not considered during the Deep-RL training. Furthermore, a reachability analysis was also performed to access the stability and robustness of the closed-loop system composed by the integrated guidance and control NN, trained for the 1-DoF scenario, and the RLV dynamics. Taking into account the fidelity of the benchmark adopted and the results obtained, this approach is deemed to have a significant potential for further developments and ultimately space industry applications, such as In-Orbit Servicing (IOS) and Active Debris Removal (ADR), that also require a high level of autonomy. The paper will describe the design, implementation, and validation of the proposed approach, presenting some of the very promising results obtained, which demonstrate the capability to successfully address the RLV landing problem with this type of technique.