B2B SaaS products should have frequent periodic downtime

Modern SaaS products aim to have as little downtime as possible. For B2C products, with users in many timezones who may use the product at any time, this makes sense. For B2B products, where users are often knowledge workers with regular business-hour workflows^[1], I think many engineers and end users would be happier if the industry collectively normalized regularly turning off applications (for example, every Sunday morning), even when there is no specific maintenance planned.

Downtime is useful! There are many classes of problems that downtime makes dramatically easier: data migrations, schema changes that break backwards compatibility, expensive batch computations, etc. Most of these problems are solvable in some way while maintaining uptime, but at the cost of significant complexity. For example, consider backwards-incompatible changes. With downtime, you can update the schema and all dependent services in a single coordinated operation. Without downtime, you must support old and new versions in parallel, deploy changes through every dependent service, monitor usage to confirm nothing relies on the old path, and eventually clean it all up, often weeks or months later.

Most engineering orgs adopt a pattern of "scheduled downtime" to reap these benefits. This is different from what I'm proposing because it happens sporadically.

This unpredictability makes it a bad experience for end users. In theory, it's communicated ahead of time. However, that communication takes the form of something like a loud banner that competes with the core UI for attention and space. Users hate to read, even when it comes to useful and exciting things like onboarding flows, so there's a very low chance they actually internalize the message, no matter how prominent you make this banner. Then, they'll be surprised when they're hit by the downtime. With frequent periodic downtime, they can learn early on what the product's "opening hours" are, and use that to intuitively plan their work in the future. Despite more absolute hours of downtime, the amount of downtime that actually disrupts user work may be reduced.

For engineers, it's predictable since they're the ones doing the scheduling, but its infrequency comes with costs as well. One such cost is that the required operations (showing the banner, taking the system down, restarting the system) are all unpracticed, which makes them painful. There's also organizational overhead in arguing whether some change is really worth scheduling downtime for, how much downtime is acceptable, etc. On their own, these arguments already consume a lot of eng hours. There's also the hidden tax of discouraging invasive but valuable changes, like simplifying data models or deleting legacy paths.

If this is as obvious as I'm making it sound, why isn't this already the norm? Beyond just the fact that many companies genuinely do need high uptime, I think another major reason is where industry norms come from. Many large, influential tech companies make most of their revenue from things like ads, marketplaces, and infra, where even brief downtime has a direct and measurable impact on revenue. Ideas developed under these constraints get exported all over the industry as "best practices", even when the underlying assumptions don't hold.

If you work on software, do you think this would work at your company? I'd love to hear your thoughts: you can email me at firstname dot lastname @gmail.com.

I'm overly simplifying things to be provocative here. There are many shapes of B2B products where this still doesn't make sense, e.g. cloud service providers, observability platforms, products that serve industries like healthcare where people work around the clock. ↩︎