The Night Shift

The idea of a software factory has been coming up a lot over the last 6 months or so as techniques like Ralph Loops and tools like Gas Town gained popularity. While playing with these approaches you often find that either there’s still some context missing that would make the build better or the work is done so quickly that you run into the “What’s next?” question much faster than you previously would.

It is now possible to build systems around your product and empower your agents to be able to improve the product they are working on while you are AFK. Most recently I’ve been using Claude Code Routines for this and have found some great results that have saved me hours of work, and allowed me to focus on the bigger things while having either the codebase or product itself improved over time.

Claude Code Routines

Claude Code Routines are simply a prompt that triggers either on a schedule, on a GitHub event, or on an incoming API call. They start with a fresh context window and can be given access to any MCP servers you can connect to Claude. Additionally, for any pull requests that are created, the agent can be set to watch for CI failures or comments and continue to update the submitted code until it is merged.

My Routines

Flaky Test Fixes

As codebases grow and test suites become larger, occasionally you will find that old tests may be “flaky” in that they usually pass and occasionally fail. These failures are usually the result of some level of randomness, be it test ordering, a hard coded date that causes the test to fail on certain days or something else entirely. Often when trying to fix a flaky test, the majority of the time goes into the investigation for why it’s flaky but the fix is usually quite small and obvious in hindsight.

This routine is a simple agent that runs each night and looks at the CI runs on a repository for the prior day and identifies any test failures that required retries and were not related to the incoming PR they failed on. The agent then investigates why the test may be flaky, fixes the underlying cause and submits a PR for review including its rationale in the description. If there are no flaky tests found, it exits without submitting code.

Over time, you can see how this routine can save hours or days over the lifetime of a repository and significantly improve delivery time as flaky tests get in the way less often.

Dependency Upgrades

Now this one you could think is already solved through tools like Dependabot however often these tools are simply looking at whether there is a new version available, submitting a one-liner to upgrade a version number and then letting CI run. Unfortunately, not all upgrades are as simple as that and sometimes require a bit more work to ensure the application continues to work.

This routine improves on the Dependabot approach by running each evening, running any audit scripts to look for new security upgrades and then coding up the required work. Sometimes this might be as simple as a version bump and other times it may require a few extra lines of config or renaming functions if there is a breaking change introduced. While it doesn’t totally remove the work, it does provide a much better starting point, often mergeable, as soon as you start your day so you can get unblocked much faster to continue your main focus. While I currently only run this for detected security upgrades you could expand it for all upgrades however if you are major versions behind you may end up with many PRs of various quality.

Outdated Content Detection

For sites that are content heavy and reference other parties, it is now possible to have a research routine that can audit your site regularly for you and flag anomalies before your users spot them. For example, a directory site might reference entities that have gone out of business, or have had bad press, or any other reason that requires the content to be amended or removed. These sorts of routines can slowly make their way through your content as it evolves, submitting changes and improvements over time, to help stay on top of historical work that needs to stay up-to-date.

Data Driven Minor Feature Implementation

Finally, for sites that depend heavily on SEO for growth, it is possible to build a routine that is connected to your analytics service that is able to not only pull analytics and report on them for you, but can also ideate around some goal and submit minor improvements. This gets into the realms of slop however at a reasonable frequency you can find gold on occasion. The idea is that the routine pulls in data around user and traffic analytics, logs, and any other context you’re happy to hand over, and then picks the most impactful idea it can come up with to improve the product based on that information before submitting a PR with its rationale and changes. A lot of what this agent outputs won’t be ready for prime time however I have seen minor features such as sorting, search bars, navigation improvements, copy improvements all be generated to a point that I would merge them. For the PRs that aren’t ready to launch, they provide you with a slow stream of inspiration that you can choose to ignore or action if ideas are half-baked.

What’s next?

While these agents aren’t 100% accurate, neither is a human. However, they produce easily reviewable work while I’m asleep that often takes minutes to check in the morning, letting me get more done than I could before.

That said, I think there is going to be a steep trajectory for these types of agents. Linters and formatters allowed codebases to enforce their own standards. The next step is to create codebases, and products, that improve themselves by not just catching what is wrong but proposing what is next by looking at more data than traditional tools had available. The routines I’m running today are the clumsy first versions of this and I’m betting that the polished version arrives faster than any of us expect.