Code Navigation for AI SWEs: What We've Learned So Far

tl;dr: We explored various approaches to code navigation for AI SWEs and share our findings on what works and what doesn't.

At Engines, we're building the best platform to run AI SWEs. Our first launch is a purpose-built code navigation system. We want to document and share our learnings as we build so here's a guide to we've discovered so far.

Just like humans, AI SWEs need to be able to easily navigate a codebase.

Current Research Approaches

In recent research, this has been implemented in a variety of ways:

SWE-Agent does string search across the entire codebase and then augments the LLM with a file viewer that presents 100 lines of code at a time
CodeMonkey's approach is to pass all files in the codebase into a small LLM and rank them by order for importance to the prompt
Moatless semantically searches the codebase to determine what files to modify

We decided to take an approach similar to OpenHands where we make code navigation a tool available for the agent to use. We expose two tools to the agent:

find all references
go to definition

This interface resembles a language server, we just have to wrap a language server in a bit of glue code, right? Not so fast! We investigated various approaches and found each comes with its benefits and tradeoffs.

We are building foundational infrastructure for AI SWEs. With that in mind, our infra should support small blog repositories all the way up to massive repositories, like Meta's monorepo.

We believe code navigation should be:

Scalable - meaning it indexes at most once per commit
Incremental - can index code incrementally as it's committed without a full re-index of a repository
Flexible - it can navigate any arbitrary commit hash in a repo
Permissively licensed - as a core primitive for AI SWE infrastructure, it should be possible to modify the system as needed

Exploring Different Systems

lsproxy

lsproxy is a purpose built LSP library for AI SWEs. It handles detecting the language of the codebase and then spinning up the right language server to parse it.

It's a reasonable approach and the folks behind it are adding additional supported languages daily.

Unfortunately, it's AGPL licensed. To use it in a commercial product, you'll need to sign up as a customer.

Stack Graphs

Stack Graphs are an advanced data structure that was pioneered at Github for powering Github's web-based code navigation, based on static analysis research. Shout out to Ayman, founder of Nuanced, who worked on Stack Graphs at Github, for sharing it with us.

The Stack Graph codebase has a few great properties:

It's incremental: each change to a repository only requires re-indexing the changed files, not a whole-repo index
It's theoretically language agnostic: indexing and querying does not require per-repository configuration and works in a language-agnostic way.
Queries are fast: once indexing is done, querying is a straightforward path tracing operation on data already stored in a database.

We were able to set up the stack graph repo and run it on our test repo. However, it doesn't have great support for a variety of languages and adding additional support is difficult.

Under the hood, Stack Graphs use tree-sitter to parse individual files. It than relies on tree-sitter-graphs which turn the tree-sitter nodes into graphs based on a custom grammar per language. These custom grammar files, called .tsg files are huge and are limited to Rust, Python, Java and Javascript/Typescript. Adopting Stack Graphs would require us to maintain these complex files indefinitely.

For these reasons, we decided Stack Graphs weren't the right approach for us.

Glean

Glean is a production code indexing system at Meta. It operates at a slightly different level of abstraction compared to Stack Graphs but can also be used for precise code navigation. It works by storing facts about a code base, based on a user defined schema and grammar.

Glean then utilizes a declarative query language, Angle, to answer questions about the codebase, such as "where is the definition of this symbol".

Glean in theory has all the properties we envision for a code navigation system:

It's incremental: only the changes in a new commit need indexing
It's scalable: it's been proven to work on Meta's massive C++ codebase
It's flexible: many queries beyond simple code navigation can be supported, for example, detecting dead code
It can navigate arbitrary commits: facts can be linked to commits, therefore allowing querying based on them

Glean has defined its own purpose-built schema for Meta's C++, but it can also utilize the output of SCIP as a schema. Because SCIP parsers are available for many languages, Glean has better language support out of the box than Stack Graphs.

Glean uses Thrift for its RPC protocol. The Thrift definition shipped with the project relies on Meta's internal Thrift client so it is non trivial to generate a Thrift client that relies on OSS Thrift in a language other than Haskell.

In addition, to leverage the full power of Glean's indexing of facts, a custom parser must be written for a language. Meta has parsers internally for Python, Typescript and other languages but OSS Glean only ships with a parser for C++.

Overall, Glean appears to be a great system with a few rough edges that limit it's OSS usage at this time.

multilspy

multilspy is a Python library that provides a convenient wrapper around an LSP server. It supports multiple languages and is very easy to set up. It was born out of a research project on incorporating static analysis into AI code generation.

The library will handle downloading the correct language server binaries for a variety of languages.

It does have a couple of small issues:

Downloading the binaries may not always be desirable
The LSP server will only be running as long as your Python process is running

Sourcegraph

Sourcegraph offers a precise code navigation API for a variety of languages. Given their API, you could easily navigate large repositories and leave the indexing to them.

Unfortunately, their API requires signing up as an enterprise customer. That makes their API a non-starter for a composable AI SWE infrastructure platform.

Our Solution

Given our explorations, we concluded that there is not yet a great open source solution to precise code navigation. An ideal solution would integrate the language-agnostic approach of systems like Glean or Stack Graphs with the usability of lsproxy.

Given the various challenges we described above, we chose to add a bit of convenience wrapping around multilspy as a first step.

We added:

A server component, with MCP support
A Docker container to make it easy to sandbox

You can check it out on Github here: engines-dev/piston

Looking Forward

Code context is a fascinating problem for AI SWEs and we've just scratched the surface. If you're building an AI SWE, we'd love to chat and compare notes! Reach us at founders@engines.dev.