Feasible Constraint Policy Optimization for Safe Reinforcement Learning
We introduce Feasible Constraint Policy Optimization (FCPO), which seamlessly combines penalty and trust region methods to address policy feasibility while ensuring stability and performance.