Figure3
![Stability-preserving automatic tuning of PID control with reinforcement learning](https://image.oaes.cc/4d3e2ccf-effe-4fc3-8d8e-0b3381829243/4601-3.jpg)
Figure 3. The structure of the actor and critic networks. Left: The actor network where layer normalization is used before each network layer. Decaying noise is added to the output to encourage exploration at the beginning of RL training. Right: The critic network that consumes state and action, and returns the