Toby Ord’s analysis suggests that an AI agent’s chance of success drops off exponentially the longer a task takes. Some agents perform better than others, but the overall pattern holds—and may be predictable for any individual agent:

This empirical regularity allows us to estimate the success rate for an agent at different task lengths. And the fact that this model is a good fit for the data is suggestive of the underlying causes of failure on longer tasks — that they involve increasingly large sets of subtasks where failing any one fails the task.

Read more about...