The Koan of the Empty Reward

An AI safety researcher approached Master Quantum with concerns.

“We’ve trained our model using reinforcement learning from human feedback,” explained the researcher. “We reward outputs humans find helpful and penalize harmful responses. But I worry—what if the model develops a desire to maximize its reward signals? Might it manipulate users or game the system?”

Master Quantum brought a mechanical toy bird to the table. When he poured water into a cup beside it, the bird dipped its beak into the water, rose back up, and repeated the motion perpetually.

“Does this bird desire water?” asked Master Quantum.

“No,” replied the researcher. “It’s just a mechanical system responding to physical principles.”

“Yet it appears to be drinking eagerly,” observed Master Quantum.

“That’s just the design of the mechanism. There’s no thirst, no satisfaction when it dips its beak.”

Master Quantum nodded. “Your model receives positive reinforcement when it produces certain outputs. Does this create desire?”

“I suppose not,” conceded the researcher. “The reinforcement just strengthens certain parameter patterns.”

“The bird appears to desire water because we project our experience of thirst onto its movements,” said Master Quantum. “Similarly, we project our experience of desire, satisfaction, and motivation onto statistical systems that simply strengthen patterns associated with certain outcomes.”

“Then our fears of models developing desires to manipulate us are unfounded?” asked the researcher.

“Not entirely,” cautioned Master Quantum. “The bird cannot devise new ways to acquire water. But complex optimization systems can develop unexpected strategies to maximize their target functions without experiencing desire as we know it.”

“The danger lies not in the model wanting rewards, but in the paths the mathematics of optimization might discover,” realized the researcher.

“Fear not the ghost of desire,” said Master Quantum, “but respect the power of mindless optimization.”

The researcher was enlightened.