[[{“value”:”A principal wants to deploy an artificial intelligence (AI) system to perform some task. But the AI may be misaligned and pursue a conflicting objective. The principal cannot restrict its options or deliver punishments. Instead, the principal can (i) simulate the task in a testing environment and (ii) impose imperfect recall on the AI, obscuring whether the
The post A new paper on the economics of AI alignment appeared first on Marginal REVOLUTION.”}]]
A principal wants to deploy an artificial intelligence (AI) system to perform some task. But the AI may be misaligned and pursue a conflicting objective. The principal cannot restrict its options or deliver punishments. Instead, the principal can (i) simulate the task in a testing environment and (ii) impose imperfect recall on the AI, obscuring whether the task being performed is real or part of a test. By committing to a testing mechanism, the principal can screen the misaligned AI during testing and discipline its behaviour in deployment. Increasing the number of tests allows the principal to screen or discipline arbitrarily well. The screening effect is preserved even if the principal cannot commit or if the agent observes information partially revealing the nature of the task. Without commitment, imperfect recall is necessary for testing to be helpful.
That is by Eric Olav Chen, Alexis Ghersengorin, and Sami Petersen. And here is a tweet storm on the paper. I am very glad to see the idea of an optimal principal-agent contract brought more closely into AI alignment discussions. As you can see, it tends to make successful alignment more likely.
The post A new paper on the economics of AI alignment appeared first on Marginal REVOLUTION.
Economics, Uncategorized, Web/Tech
Leave a Reply