You're Sleeping on AI SWEs

December 18, 2024

tl;dr: AI Software Engineers are coming faster than you think. Spend some time trying the new tools now since they’ll be your colleagues in the future.

I have a confession. I have been an AI skeptic for a long time. In college, I took machine learning courses but I assumed ML would be confined to flying model helicopters and labeling emails as spam.

I was wrong.

I now believe AI is going to completely alter the way we interact with computers.

Let me share how I arrived here.

It started with my exploration into AI software engineers.

A Personal Anecdote

Last year I started a company with my friend, Jet. We’ve explored multiple business angles, as is common for a startup, and so most of the programming I do is quick prototyping.

My default workflow is:

  1. Think of an idea
  2. Fire up Zed
  3. Chat with Claude to produce the first version of my program
  4. Run the output
  5. Feed the errors back into Claude
  6. Repeat until the program is working

Claude is my compiler, my pair programmer and my conversational partner all in one.

However, steps 4, 5 and 6 can get tedious.

I often find myself copying entire API definitions into Claude and informing it of new libraries or versions when it uses something outdated. I feed it byzantine Typescript errors ad nauseam until I hit diminishing returns.

That experience led me to prototype a Typescript compiler wrapper that fed errors into Claude and gave me a natural language explanation of their cause.

The Typescript error explainer shrank my debugging time so much that I began to think maybe AI could help me write code in even better ways.

That thought eventually led me to explore fully autonomous AI software engineers.

AI SWEs

Enter OpenHands: an open source AI software engineer.

The company behind OpenHands, AllHands, is a team of industry veterans, computer science PhDs, and a computer science professor. They’re serious about their craft and they’re shipping fast. OpenHands consistently performs at the top of SWE-Bench, achieving results on par even with OpenAI and Anthropic.

Convinced that I could rely on Claude itself to iterate on code, I booted up OpenHands.

I asked it to make an UI component change, with corresponding changes to Prisma model and GraphQL resolver. To be honest, the first draft of PR took quite a few iterations of prompting.

However, I expect OpenHands and other AI SWE systems like it to improve rapidly, and what gives me faith is watching the OpenHands team dogfooding their own product on GitHub.

You can see that human/agent collaboration on this PR. OpenHands makes an improvement to it’s own UX which involves changes to the backend data model, API and frontend code. Graham, the chief scientist at All Hands, guides the agent through multiple iterations. Eventually the PR is ready to be merged without any direct human editing of the code.

The OpenHands agent has quite an active commit history! Peeping through the commits, many are for small bug fixes and documentation updates but some, like the one above, are actually modifying functionality.

The OpenHands team is on the forefront of a new paradigm for software. They are working directly with an AI SWE, learning its limitations and rapidly improving it.

Importantly they are iterating through various workflow patterns that will become commonplace:

  • Chatting directly with OpenHands
  • Having OpenHands generate a first draft PR via Github Issues
  • Iterating with OpenHands via comments on PRs

In a few short years, we will all be working with AI software engineers.

What needs to change

There are bunch of implications for every engineer using an AI SWE pair programmer.

A few of the most interesting questions are:

  • What should code reviews look like for code generated by AI SWEs? For example below where OpenHands has generated a huge number of branches. The OpenHands agent has created so many fixes, the branches are becoming unmanageable
  • How will we prompt models? Will there be a language for “programming models” with the LLM as a “natural language” compiler?
  • How will we run these agents? Will they be centralized (run by Anthropic or OpenAI or others) or a part of each company’s cloud infra and open source?

Where to go from here

Try the tools!

For any software engineer today, trying the latest AI tools is imperative. The profession is in transition from writing craft lines of code to orchestrating AI systems that build and maintain software.

It’s easy to get started with Cursor’s new agentic workflow or run OpenHands locally.

If you’re interested in the academic literature, the new SWE Bench Multimodal paper is a great starting point.

Our vision

We believe AI software engineers will be a critical piece of infrastructure for every company, customized to their workflows, codebase and business problems.

We are building a platform to enable that future: tools, context, and sandboxes for AI SWEs.

If that excites you, we’d love to hear from you: founders@engines.dev