Tuesday, November 10, 2015

On Half-Baked Changes And the Lava Anti-Pattern

Complex Dependency Diagram (TheDailyWTF)

There’s something about abstractions that I feel most developers have yet to see enough of to internalize. It’s been called many names; it’s been misunderstood as one thing or another. It’s a quite general problem, but I’m going to choose the label ‘lava layer’ because a number of good blog posts (Really, go find them. They’re great reads.) on the topic have chosen to address that very specific but common failing.

The ‘lava layer’ moniker stems from the fact that a hot layer of lava will cover everything that it touches, providing a surface with the same features as the underlying rock but with less resolution. The same thing happens when you start a massive ‘overhaul’ of a production system. Let’s say that the original design had some horrible wart that every developer who touches itneeds to change. Your IO was hidden away in corners of code such that it’s impossible to reason about whether simple helper functions interrupt file writes. You’re using a global shared variable to hold errors. Your application has a singleton that has accumulated semantics about everything from the XML output to the connection pooling on some resource. It’s ugly. It was temporary 5 years ago.

The reality is that it’s not anymore. It’s how the core of your business value is delivered. Sadly, when you finally convince the powers that be to allow you to refactor it into your beautiful design pattern of choice, you realize something. It’s going to take you weeks to fix. During these weeks, almost every new contribution is going to touch parts of the code that you’re refactoring to not touch the ‘bad place’ in the code. You have a problem with your ideal workflow; if you refactor into your own little cutsie branch, you will be forced to commit seppuku when you realize that you’ve refactored code that has changed dramatically and you need to spend more time refactoring that code. Like the runner in Greek philosophy who always traverses half of the distance to his destination each quantum of time and never gets there, you will always disappoint your superiors because the refactor will always be slowed by ‘actual work’(TM).

The alternative is to make small changes and merge it into the main branch. You’ll have inconsistent use of the feature you’re fixing, but you can break people’s kneecaps if they introduce the old way of doing things into the code you’ve already touched. This has the difficulty that you must watch every merge, and double-check work that you’ve painfully refactored. If the semantics of a place that your change touches has changed, it is unquestionably your job to fix that. Can you keep the meanings of control flow of an entire 100,000+ commit project in your head? If so, stop reading this blog post and write me a better software ecosystem. If you’re a mere mortal, then you realize that the only way to accomplish a poor attempt at vigilance is to enter crunch time. It will suck until you’re done, but that’s your fate. So you keep working, you get to the home stretch, and then suddenly you need to deliver business value beyond a refactor. The bugs beseech you.

This is the lava anti-pattern. You have a codebase about 95% converted from one convention to another. In the worst case, your new refactor is a layer on top of the old paradigm. Whenever someone tries to write new code, they feel like they should choose between a ‘high-level’ api or a ‘low-level’ one. They’ll spend as much time on the ‘high-level’ one as it takes to get a new email notification. They’ll use the ‘low-level’ one, and now you’ll need to break those kneecaps. In the best case, it’s a change so irreconcilable with the old system that any time someone tries to choose between which system to use, they ask someone out of fear and confusion. Your project becomes scary and confusing, but consistent.

That’s what this all comes down to: consistency. In deciding to merge half-fixed changes, you’ve prioritized coordination with your peers over consistency of design. Your internal API eventually becomes this layer of abstractions warped around the fixtures of earlier abandoned designs. You have setters without getters. You have function variants with the suffixes _safe, _checked, _internal, or _full, and not an idea why each exists. Nothing, and I mean nothing, is really orthagonal. Your first symptom of such a codebase is that making any small change requires copying most of the body of an existing function into a new function with a different signature and make the old one wrap the new one. You’re afraid to admit that you don’t even know why the old one was around! You’ll name this one _unwrapped, because the buck stops with nobody, since nobody is an expert there. You’ll progress, you’ll be productive, you’ll close bugs, but at 1AM in the night you’ll wake up and the first words on your lips will be “Maybe I used that helper function wrong.” or “I know what that line noise meant now! That data structure only works in a certain context!” If you’ve ever thought this in the shower, or in bed, or at a baseball game, you know the disease I’m talking about. It destroys lives.

This just isn’t living.

There’s no solution, you’re sure. Complexity hell is what it takes to make it. What’s this joker going to suggest? You’re not going to immediately like my answer. It’s not the easiest, the most sane, the one you see in SE 101, but it’s the only one that works. You need to change your project’s language. I don’t mean jumping ship from Python to Haskell and making a whole rewrite. I don’t mean deciding that C++ is a natural expansion from your C. I mean investing in tooling.

Why tooling? This is a human failing, right? The problem with the lava layer is the same problem with manual memory management. Human beings have as much mental ‘scratch work’ space (called working memory by experts in that sort of thing) as it takes to figure out the scope of the variable they’re looking at. No matter the decades of information you’ve accumulated, your bottleneck is the number of things you can hold in your mind at once. This is the von Neumann bottleneck, but with working memory taking the place of RAM. Do you feel demotivated? You shouldn’t. It’s only a problem if your answer is to work harder, not smarter.

The problem with the lava layer is that you’re making the choice on someone else’s behalf to work harder, not smarter.

When someone who has never touched a line of your project’s code before opens an editor and points it at a mundane line(ex: fixing a bug related to choosing not to handle some easy-to-specify edge case) they should know immediately how to handle the data types around them. What’s that, you’re saying? They should if only they knew “well, that error type needs an init function called, and they need to register that file handle with the resource collector and there’s a set of safety functions they must call to ensure…”.


I’m sorry. You’re wrong. You’ve just changed the semantics of the programming language that you’re expecting your contributors to use. That’s fine, but what’s not fine is that you haven’t realized this. Your tooling hasn’t realized this. Your engineers haven’t realized this. The problem of the lava layer is the problem of unintended programming language dialects. Learning to program in a dialect means doing specific things that are more exacting than the requirements of the base language. Humans who try to communicate via different dialects quickly realize that they don’t understand each other, but contributors won’t understand this about your application unless you were lucky enough to write tests for their not-yet-written code. If you are going to change the meaning of your programming language, if something is unequivocally wrong, you need to provide a way to check this.

And this is where static analysis comes in. Parsing is cheap. Compilers themselves are scary, but that’s not what it takes to check sanity. Annotation a la academic purity is expensive, but that’s not what it takes. Tools to prove that your code never makes a mistake are expensive, especially when you want customizations for your project’s dialect, but that’s not what it takes.What it takes is you spending an afternoon finding a parser for your language, learning how to walk a tree-based data structure, and learning to codify your patterns in a way that can be checked.
As an example, let’s say that you’re going to switch from using a shared, global, singular ‘error’ variable to threading pointers to empty ‘error’ structs so that functions can check when their callees are wrong. Your new convention has a few rules. No error can be used before it’s been zeroed out somewhere. You can’t free an error that hasn’t been set. The classic, C-style, memory management. You yawn. This is the pain of programming, you say. Get out of the kitchen if you can’t handle the fire, you say.

This works for a small project, but not when your project exceeds the understanding of one person. Once half of your monolithic architecture behaves one way, and the other half is still behaving another way, which goddamn way should the open source contributor who wants to share his singular commit to fix formatting errors use? You’ve got 15 seconds for him to find out before he uses the one that’s the easiest. “Oh this takes a pointer to an error. I don’t know how to handle errors here. Best put a null.”

Who needs to prevent this change from being merged? You. Are you Batman? Do you want to constantly watch over every street, every commit, every errant contributor? No.
Let’s say you took the static analysis route. You spend the first 60 hours(to be generous, I’ve seen static analysis tools take an afternoon to write before) writing a little checker that walks over all of your functions, tags each as either setting your old error or your new error, makes sure that you never cleanup before calling a function with an init call in it, and make sure that you never discard an error. Now when a contributor/junior dev/intern/homo erectus makes a PR, your mergebots or merge viceroys run your tool. They see an inconsistency, and can tell the contributor that if they make a small change here or there, you’d be glad to merge his flawless, proven code.
You see what you just did there? You just took the task of human pattern matching and made it into a script. You just made your computers or reviewers into experts about your new style change. If you integrate this check into the compilation pipeline, the heresy of mixing conflicting internal styles becomes something your compiler will refuse. It’s no longer a matter of shaming people into remembering to use one style rather than another, it now becomes part of the rules of the game, and you didn’t even need to learn Haskell.

So what am I trying to say? That people shouldn’t write code? That all companies should invest in internal domain specific languages? That refactors are risky? That developers are dumb and need straightjackets? No, none of that. What I’m saying is that the smartest among us make the cardinal mistake of worrying that they’re making the product right rather than that they’re making the right product. The last 50 years in Computer Science has taught us that investing in computer assertions about correctness are a flat, fixed cost while investing in developer assertions about correctness is unending. It’s not free, but it’s relatively cheap to throw together a tool that makes sure that your most recent religious crusade on your codebase is free of heretics. If there’s anything that you must know to contribute to a project, and it’s not part of how the project compiles or is checked on test builds, then you can be goddamn certain that you’ll be paying developers to manually check and fix that pattern until the end of days.

The lava pattern, the practice of allowing half-baked changes to sit around in the codebase and enforcing a lack of regression through guilt, shame and runtime assertions, is eminently fixable. You just need to make it someone’s job to write the program to keep it fixed.

No comments:

Post a Comment