Jack Trigger Racing.
Deep reinforcement learning for control systems.
Modern ocean racing sailing boats are high performance machines, almost more comparable to aircraft than the yachts of old. They combine cutting edge material science, aero and hydrodynamics, navigation systems, telecommunications, and sensors.
However, one underdeveloped technological domain is the use of real-time data from boat sensors for automated performance optimisation. While data is relayed via displays to a human, there is little or no interfacing with the autopilots that are vital for long distance racing with only one or two crew.
Jack Trigger Racing (JTR) approached T-DAB.AI (T-DAB) to explore how machine learning could improve the autopilot function.
During long-distance solo races, sailors rely on the autopilot to steer the boat more than 90% of the time. However, current autopilot technology underuses the vast amount of inputs available from the fully integrated IoT network of sensors installed on modern yachts and can only perform around 80% (in terms of boat performance metrics) of the human sailor. This is a vital margin in race conditions, especially when the time difference between first place and second place in the last Vendée Globe was 2%.
The Innovation Lab team at T-DAB worked with Jack Trigger Racing (JTR) and our partners to develop a reinforcement learning agent that would augment the current autopilot system. The reinforcement learning system is not influenced by human biases and methods and therefore can potentially exceed not only the current autopilot but the sailors themselves. Additionally, careful optimisation of the system objectives allows to prioritize both the speed of the boat and the safety of the sailor.
Deep reinforcement learning is effective in this application as it doesn’t require labelled data to exceed human performance. However, it does need an accurate simulation environment. This environment was developed in parallel and included a digital twin. Read more about the Digital Twin here.
Reinforcement learning allowed to balance different objectives, such as speed, following the course and minimization of drag. Deep Deterministic Policy Gradient algorithm allows working with continuous actions, allowing as flexible steering as a human would perform. From learning in simplified environments, it has now progressed to learning in progressively more realistic environments to enhance the performance.