diffs

Why you should prefer to publish code changes in small PRs?

Jayesh Kawli

Jan 21, 2022 • 3 min read

When I was a Software Engineer at Meta, we had a concept of PRs called as diffs. Diff is basically another name for the pull request, but instead of spanning across multiple files with the usual many code changes, it will be a self-contained code change with as much minimal code as possible to ease the review.

We would then stack multiple diffs on top of each other starting with number says 1/x going until 12/x. If the developer had already figured out how many diffs were they going to divide the change, they can explicitly mention that number instead of arbitrary x.

Organizing code changes, whether bug fix or a feature into diff had distinctive advantages compared to the usual approach.

First, since changes are done in the small chunk, they are easier to review for fellow engineers. When someone looks at PR with many changes, they are discouraged to review it in the first place. But if they're smaller, it reduces the cognitive load on the reviewer and they can do a lot better job of reviewing and commenting on the code.

Second, in the case of SEV, instead of reverting everything, you can revert only that one diff. Since diff is a standalone change passing all build checks and tests, reverting it will not affect other parts of the stack. Reverting small changes is also faster and it can immediately mitigate any customer-facing issue.

Third, when developer stumbles upon code changes, they can activate blame and see which diff made them last, their history, and the reasoning behind them. Since diffs are smaller and provide context focused on changes done only in that diff without regard for everything that is happening for that feature, developers spend less time understanding that change. If they aren't clear about other parts associated with that feature, they can of course take help from other diffs on the same stack.

Fourth, each diff acts as documentation on its own. With minimal lines, it has a story to tell which other developers will be interested in exploring. Since diff is designed to do exactly one task, you can better document it in the description section focusing only on that part. Imagine merging everything in one big PR and writing 10,000 words essay for everything going on with that change. No one wants to read that. With smaller diffs, you can focus on how you tested that feature, how to configure the code to replicate these conditions and a test plan that includes before and after behaviors of the app after fixing the bug. All these details can be quite useful for someone wanting to apply similar changes or if the bug reproduces itself in the production, you can take a look at how the test plan was written.

Fifth, with a stack of diffs, you aren't blocked on review. It's possible that you make one big PR and no one is willing to review it for many days. Not going to play well for anyone. With smaller diffs, people will be more inclined to review it and you can even ask other teammates to review the rest of the diffs in the stack

Sixth, you might be working on changes that affect different parts of the codebase that no single developer is familiar with. For example, your code may be touching many modules such as HomePage, Checkout, Browse, Sales, Promotions. If you combine them all in one PR, people might still be forced to look at the code they aren't familiar with. Dividing big PR into smaller diffs organized into stack allows reviewers to look only at code they are familiar with own thus again speeding up and improving the code review quality

Seventh, you can also easily track down build and test failures with smaller diffs. If the build or test checks fail, it tells you which diff it failed on. You can then focus only on that diff and inspect how those changes might have led to failure.

Eighth, this technique also provides early feedback to your architectural decisions. Imagine making one big PR and then waiting for someone to tell you that your foundation was all wrong. Now imagine the opposite situation where you start publishing your changes in small chunks and if someone points out an issue, you can fix it early in diff stack without affecting future changes built on this foundation. (As long as your code structure allows it, for example, through interfaces)

I hope I convinced you with the importance of small PRs or even organizing your code changes into stack of diffs (bit-sized PRs) to improve the effectivenes of code review. If you have any more thoughts or feedback about this approach or if you already faced the challenges of large PRs and found solutions for them, I would be interested in learning more about them. You can always contact me on Twitter @jayeshkawli

Sign up for more like this.