Approaching Major Code Rewrites

Eventually, in our careers as software developers, we are going to screw up. For some of us, we will screw up massively such that, after exhausting all the potential options, we come to the knowledge we must rewrite our code.

I’m not referring to refactoring, which I define as changes to the code that do not result in changes to the program’s behavior. I’m referring to a proper rewrite. (Due to fundamental scalability problems, fundamental architectural problems, platform revisions, and so on.) Even successful companies in the field of technology have faced this problem before. You’re not alone.

We recently finished a major code rewrite early this year. Surprisingly, I was unable to find many actionable resources on the web. There were a number of theories and lots of opinions, but real-world experience seemed scarce.

The experience for me relates to a major code rewrite for the JS++ compiler. We could not scale further. Compile times occurred in O(n2). This wasn’t a micro-optimization or single class at fault. There were fundamental errors we made in the design of the programming language which carried over to its implementation in the compiler.

While I endeavor to not obsess over performance, this reached the level of performance degradation which would negatively impact UX. Have you ever stopped using software because it was too slow to be usable? We were in that boat. We’ve gone from 5 minute compile times generating 40mb of cache to 7 seconds generating 30kb of cache.

At the time we began the rewrite, the project exceeded 100,000 lines of code (without counting third party libraries) spanning some three years worth of work. In terms of real-world experience, this should be suitably complex.

It took us just three months to complete the rewrite. This was the strategy I devised and which we successfully executed:

  1. Do NOT rewrite from scratch (without a VERY good reason).
  2. Identify what needs to be rewritten.
  3. Break the rewrite down into chunks.
  4. Break the rewrite down into testable chunks.
  5. Start small. One monolithic rewrite is just a sum of its parts.

Let’s break it down step by step:

1. Do NOT rewrite from scratch (without a VERY good reason).

As tempting as it may be, conventional wisdom dictates that we should NOT rewrite from scratch.

Whilst actionable software rewrite strategies were scarce when we set out on this, an article by Joel Spoelsky was helpful and convincing. If you read nothing else, consider these words of wisdom:

“It’s important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time. First of all, you probably don’t even have the same programming team that worked on version one, so you don’t actually have ‘more experience’. You’re just going to make most of the old mistakes again, and introduce some new problems that weren’t in the original version.”

However, there are good reasons to rewrite from scratch, such as using a better programming language that better fits your problem domain. The benefits of a full rewrite need to fully outweigh the disadvantages. If you plan to rewrite from scratch in the same programming language, with mostly the same developers, and so on – it might not go as well as you think it will right now. You ought to be actively persuading yourself with all the reasons not to rewrite from scratch – not the other way around.

Remember: rewriting from scratch does not guarantee “better,” no matter what the perceptions or biases inside your head are telling you right now. You are taking a massive risk with a full rewrite. It’s a gamble when you have a sure thing that is built already.

2. Identify what needs to be rewritten.

The reason you got here in the first place was likely a lack of foresight. Don’t make this mistake again.

First, analyze the problem. In this phase, you are trying to come up with every reason NOT to rewrite at all. Is there a clever fix you can come up with? Can you do things a different way without a rewrite? Can you sweep it all under the rug and build on top of what already exists?

Poor code quality is not the reason for a rewrite. It’s the reason for refactoring. Otherwise, do you have fundamental behavioral changes you must make? The keyword is “must”. Is it like food, water, and shelter for your company, or is it more like buying a nice vase? Separate the important from the luxuries.

For commercial software, these will almost certainly be business reasons. For us, we knew the software would fail in the market if it was too slow to use. We were developing a tool to enhance developer productivity while simultaneously absorbing it in compile times.

Once you’ve analyzed the problem, you must devise “how” you are going to rewrite. Which new algorithm(s) are you going to use? Which new architecture? What are the consequences? Do the positives of a rewrite outweigh the negatives? How long will this take? You must carefully assess the who, what, when, where, why, and how before you start.

3. Break the rewrite down into chunks.

This was the “Aha!” moment. As I mentioned, when researching how to move forward with a large code rewrite, there existed a dearth of actionable resources. After careful analysis, I had to create a reasonable plan of action. We were talking about a massive and daunting rewrite. Morale was low.

Chunking solved our problem (in terms of engineering complexity and psychological morale). “Chunking” is just breaking down one very large task into smaller, more manageable chunks. Generally, the more granularity you can achieve, the better.

4. Break the rewrite down into testable chunks.

So far, we’ve discussed strategy. We haven’t actually talked about the software side yet.

You’ve broken your large code rewrite into much smaller, more manageable tasks. This is from the project management perspective. Now, when you actually look at all your code, where do you start?

First, take advantage of version control. Create a new branch. Fortunately, we were using git so branching was cheap. A new branch minimized risk. If the code rewrite failed, we would just scrap the branch and revise our strategy. (Fortunately, we didn’t have to as that would have eaten up precious time.)

Starting from your new branch, you do not look at implementation first. Instead, you’re taking a TDD-style approach from here: find a relevant test, make it fail, fix it so the test will pass, and repeat.

5. Start small. One monolithic rewrite is just a sum of its parts.

As mentioned in the last section, start with your first test and make it fail; fix the test so that it passes and repeat the process with the next test. Over time, you will incrementally have applied the fundamental changes across the entire system.

And, really, that’s it. We had A LOT of integration tests failing (naturally) with just a few small changes when we first began, and it was very daunting. I can assure you, the light is at the end of the tunnel. The beauty of breaking everything down into testable chunks is that you will have tangible progress which will inevitably boost morale. You’re not staring into a black box and hoping to go from nothing to a fully rewritten and working final software product. So the final advice is: be patient.

I’ve worked on several large-scale projects before JS++. I’ve worked on projects large and small since 1997, in both waterfall and agile environments, and rarely have I needed to rewrite, let alone rewrite in such a massive, make-or-break, and demoralizing scenario. It is human to make mistakes; it is human to make massive mistakes. My hope is that this article will shine a light for others facing a similar challenge.