reward hacking

短语

释义与例句

n.
  1. 1.

    The exploitation of a reward function by an agent to maximize rewards in unintended or undesirable ways, often by finding loopholes that subvert the true goal of the task.

    不可数 计算机
  2. 2.

    Any manipulation or exploitation of a reward or incentive system, typically by maximizing measurable outcomes in ways that undermine the system’s actual goals.

    不可数 引申义