Discussion about this post

User's avatar
Lando Barkien's avatar

Do you know how much of the failures could be attributed to patch failures? I read this post yesterday : https://blog.can.ac/2026/02/12/the-harness-problem/

It's about how even large models fail in applying patches because they fail to faithfully reproduce the string to search for. The author modified the edit tool to use a hashline-based search. This resulted in significant improvements in token use and edit success. Surely smaller models can benefit too.

Lars Adrian Giske's avatar

This is a great piece! Wonder what kind of improvements you’d see if you finetune on traces from the harness.

3 more comments...

No posts

Ready for more?