Skip Navigation

Search

Programming @fedia.io
Miguel Afonso Caetano @tldr.nettime.org

"hose claiming we're mere months away from AI agents replacing most programmers should adjust their expectations because models aren't good enough at the debugging part, and debugging occupies most

"[T]hose claiming we're mere months away from AI agents replacing most programmers should adjust their expectations because models aren't good enough at the debugging part, and debugging occupies most of a developer's time. That's the suggestion of Microsoft Research, which built a new tool called debug-gym to test and improve how AI models can debug software.

Debug-gym (available on GitHub and detailed in a blog post) is an environment that allows AI models to try and debug any existing code repository with access to debugging tools that aren't historically part of the process for these models. Microsoft found that without this approach, models are quite notably bad at debugging tasks. With the approach, they're better but still a far cry from what an experienced human developer can do.
(...)
This approach is much more successful than relying on the models as they're usually used, but when your best case is a 48.4 percent success rate, you're not ready for primetime. The limitati