reward hacking
短语释义与例句
n.
-
1.
The exploitation of a reward function by an agent to maximize rewards in unintended or undesirable ways, often by finding loopholes that subvert the true goal of the task.
不可数 计算机 -
2.
Any manipulation or exploitation of a reward or incentive system, typically by maximizing measurable outcomes in ways that undermine the system’s actual goals.
不可数 引申义