BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

The Unplanned Downtime Nightmare — And How Operators Can Avoid It

Forbes Technology Council

President and CEO at VoltDB Inc.

The recent six-hour-long outage of the Facebook family of apps cost the company nearly $100 million in revenue. Worse, it drove millions of social media users to Twitter as people couldn’t view their Facebook feeds, exchange WhatsApp messages or post Instagram reels. And it sparked a derisive meme fest that didn’t do the brand much good.

Uptime Institute’s 2021 Global Data Center Survey (via Facility Executive) reveals that outages, while less pervasive than in previous years, have become way more expensive. Over 60% of the respondents reported losing more than $100,000 to downtime. Of that 60%, 15% lost over $1 million. 

Per a 2016 Ponemon study, “Downtime costs for the most data center-dependent businesses are rising faster than average." And of course, it’s the most highly data-driven businesses — such as financial services, industrial IOT, healthcare and communications service providers (CSPs) — that are the most likely to take the biggest financial hits when the lights go out.

CSPs are becoming more and more software-driven, which means outages will become more of an implicit expectation. Outages not only affect the subscribers but also latency-sensitive services such as emergency services and those provided by IoT-driven industries. With the fourth industrial revolution in progress, the cost and loss associated with these outages is increasing rapidly and affecting CSPs’ ability to retain existing customers and acquire new customers due to the loss of brand value.

The Downtime Ripple Effect

Data centers are a critical part of the telecom infrastructure, delivering a multitude of services to their own operations teams as well as subscribers and users. Modern telco data centers support not just connectivity but also applications that need a dynamic network that scales seamlessly to meet stringent availability, scalability, security and performance SLAs.  

Downtime is unaffordable because of the costs and the real-world problems it causes. Be it email, chat, billing, charging or even customer support, unplanned downtime brings business to a grinding halt. Revenue is lost due to inaction or delay. Fraud becomes possible because critical data is unavailable in the moments that matter. There can be staggering data losses and an enormous amount of time, effort and resources spent in data recovery. Let's not forget the legal and regulatory ramifications of mission-critical services being unavailable for customers, employees and other stakeholders. 

Telcos that have already begun transforming their data systems to meet 5G requirements, high data throughput, ultra-low latency and massive machine-type communication are on the right path to avoiding the worst outcomes of unplanned downtime, but they will need something special to truly prevent downtime in the age of 5G. 

Distributed Resilience To The Rescue 

The best way to avert (or at least minimize) downtime is to safeguard the resilience of your data centers and IT infrastructure. 

Historically, enterprises achieved resilience via active-passive models, where one data center kept a real-time master copy of the data while the other data center had a time-lagged imperfect copy. However, this doesn’t work well in the 5G era, because applications need to act on data the moment it’s created, before it loses its value. 

Today’s telcos need the data (and applications) to exist in distinctly separate and geographically distant data centers while maintaining the ultra-low latency required to respond within single-digit milliseconds to customer requests in order to meet SLAs. 

Enter, active-active cross data center replication (XDCR).

The Promise (And Perils) of XDCR 

XDCR allows data to be replicated across clusters that are potentially located in different data centers so that when you update your local database, the changes automatically propagate to all the other copies. There isn’t one single master copy of the data but multiple live copies. This allows for high resiliency, high availability and no single geographic point of failure. 

However, XDCR involves changes going in multiple directions at once, making it near impossible to avoid conflicts when data actions take place in two geographically distant data centers within seconds of each other, especially within a telco network, where sometimes you have two people changing a calling plan’s data at the same time. 

Most modern data platforms use built-in functionality to address data conflicts but often fail to manage the fallout of this conflict resolution. This happens because they typically utilize time-stamped-based reconciliation or conflict-free replicated data types to merge numerical changes. If your billing and charging processes rely on active-active data centers in Los Angeles and San Francisco making independent decisions to meet SLAs, it is absolutely possible for two different users to spend the same last dollar twice, leading to a negative balance — and angry customers. 

The Real Solution for Unplanned Downtime: Triple-Active XDCR

To make the most of 5G, telco operators and other types of enterprises need active-active-active XDCR combined with application-level conflict resolution. 

Triple-active XDCR is essentially having at least two backup data centers so that, in the event the first backup goes down while the original data center is undergoing planned downtimes (for regularly scheduled maintenance, for example), everything keeps going and customers don’t experience a loss of service due to the outage. This also allows enterprises to fix the consequences of inevitable data conflicts as they appear while still meeting modern 5G applications’ ultra-reliable low-latency requirements. 

The catch here, though, is that setting up triple-active XDCR with legacy technology is very expensive. However, modern cloud native data platforms have brought it within reach. With a simplified tech stack that uses less layers to process data, companies can fully capitalize on the power of triple-active XDCR to avert the nightmare of unplanned downtime and make 5G their friend instead of their foe. 


Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?


Follow me on Twitter or LinkedInCheck out my website