Figure3

Stability-preserving automatic tuning of PID control with reinforcement learning

Figure 3. The structure of the actor and critic networks. Left: The actor network where layer normalization is used before each network layer. Decaying noise is added to the output to encourage exploration at the beginning of RL training. Right: The critic network that consumes state and action, and returns the $$ Q $$ value for updates with policy gradient.

Complex Engineering Systems
ISSN 2770-6249 (Online)

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/