Discussion about this post

User's avatar
Lars Adrian Giske's avatar

This is a great piece! Wonder what kind of improvements you’d see if you finetune on traces from the harness.

Lando Barkien's avatar

Do you know how much of the failures could be attributed to patch failures? I read this post yesterday : https://blog.can.ac/2026/02/12/the-harness-problem/

It's about how even large models fail in applying patches because they fail to faithfully reproduce the string to search for. The author modified the edit tool to use a hashline-based search. This resulted in significant improvements in token use and edit success. Surely smaller models can benefit too.

1 more comment...

No posts

Ready for more?