How to build a data architecture to drive innovation—today and tomorrow

 

Here is an excerpt from an article Written by Antonio Castro, Jorge Machado, Matthias Roggendorf, and Henning Soller for the McKinsey Quarterly, published by McKinsey & Company. To read the complete article, check out others, learn more about the firm, and sign up for email alerts, please click here.

* * *

Yesterday’s data architecture can’t meet today’s need for speed, flexibility, and innovation. The key to a successful upgrade—and significant potential rewards—is agility.
Over the past several years, organizations have had to move quickly to deploy new data technologies alongside legacy infrastructure to drive market-driven innovations such as personalized offers, real-time alerts, and predictive maintenance.However, these technical additions—from data lakes to customer analytics platforms to stream processing—have increased the complexity of data architectures enormously, often significantly hampering an organization’s ongoing ability to deliver new capabilities, maintain existing infrastructures, and ensure the integrity of artificial intelligence (AI) models.
Current market dynamics don’t allow for such slowdowns. Leaders such as Amazon and Google have been making use of technological innovations in AI to upend traditional business models, requiring laggards to reimagine aspects of their own business to keep up. Cloud providers have launched cutting-edge offerings, such as serverless data platforms that can be deployed instantly, enabling adopters to enjoy a faster time to market and greater agility. Analytics users are demanding more seamless tools, such as automated model-deployment platforms, so they can more quickly make use of new models.
Many organizations have adopted application programming interfaces (APIs) to expose data from disparate systems to their data lakes and rapidly integrate insights directly into front-end applications. Now, as companies navigate the unprecedented humanitarian crisis caused by the COVID-19 pandemic and prepare for the next normal, the need for flexibility and speed has only amplified.For companies to build a competitive edge—or even to maintain parity, they will need a new approach to defining, implementing, and integrating their data stacks, leveraging both cloud (beyond infrastructure as a service) and new concepts and components.

Six shifts to create a game-changing data architecture

We have observed six foundational shifts companies are making to their data-architecture blueprints that enable more rapid delivery of new capabilities and vastly simplify existing architectural approaches (exhibit). They touch nearly all data activities, including acquisition, processing, storage, analysis, and exposure. Even though organizations can implement some shifts while leaving their core technology stack intact, many require careful re-architecting of the existing data platform and infrastructure, including both legacy technologies and newer technologies previously bolted on.

Such efforts are not insignificant. Investments can often range in the tens of millions of dollars to build capabilities for basic use cases, such as automated reporting, to hundreds of millions of dollars for putting in place the architectural components for bleeding-edge capabilities, such as real-time services in order to compete with the most innovative disruptors. Therefore, it is critical for organizations to have a clear strategic plan, and data and technology leaders will need to make bold choices to prioritize those shifts that will most directly impact business goals and to invest in the right level of architecture sophistication. As a result, data-architecture blueprints often look very different from one company to another.

When done right, the return on investment can be significant (more than $500 million annually in the case of one US bank, and 12 to 15 percent profit-margin growth in the case of one oil and gas company). We find these types of benefits can come from any number of areas: IT cost savings, productivity improvements, reduced regulatory and operational risk, and the delivery of wholly new capabilities, services, and even entire businesses.

So what key changes do organizations need to consider?

1. From on-premise to cloud-based data platforms

Cloud is probably the most disruptive driver of a radically new data-architecture approach, as it offers companies a way to rapidly scale AI tools and capabilities for competitive advantage. Major global cloud providers such as Amazon (with Amazon Web Services), Google (with the Google Cloud Platform), and Microsoft (with Microsoft Azure) have revolutionized the way organizations of all sizes source, deploy, and run data infrastructure, platforms, and applications at scale.

One utility-services company, for example, combined a cloud-based data platform with container technology, which holds microservices such as searching billing data or adding new properties to the account, to modularize application capabilities. This enabled the company to deploy new self-service capabilities to approximately 100,000 business customers in days rather than months, deliver large amounts of real-time inventory and transaction data to end users for analytics, and reduce costs by “buffering” transactions in the cloud rather than on more expensive on-premise legacy systems.

Enabling concepts and components

  • Serverless data platforms, such as Amazon S3 and Google BigQuery, allow organizations to build and operate data-centric applications with infinite scale without the hassle of installing and configuring solutions or managing workloads. Such offerings can lower the expertise required, speed deployment from several weeks to as little as a few minutes, and require virtually no operational overhead.
  • Containerized data solutions using Kubernetes (which are available via cloud providers as well as open source and can be integrated and deployed quickly) enable companies to decouple and automate deployment of additional compute power and data-storage systems. This capability is particularly valuable in ensuring that data platforms with more complicated setups, such as those required to retain data from one application session to another and those with intricate backup and recovery requirements, can scale to meet demand.

2. From batch to real-time data processing

The costs of real-time data messaging and streaming capabilities have decreased significantly, paving the way for mainstream use. These technologies enable a host of new business applications: transportation companies, for instance, can inform customers as their taxi approaches with accurate-to-the-second arrival predictions; insurance companies can analyze real-time behavioral data from smart devices to individualize rates; and manufacturers can predict infrastructure issues based on real-time sensor data.

Real-time streaming functions, such as a subscription mechanism, allow data consumers, including data marts and data-driven employees, to subscribe to “topics” so they can obtain a constant feed of the transactions they need. A common data lake typically serves as the “brain” for such services, retaining all granular transactions.

Enabliing concepts and components

  • Messaging platforms such as Apache Kafka provide fully scalable, durable, and fault-tolerant publish/subscribe services that can process and store millions of messages every second for immediate or later consumption. This allows for support of real-time use cases, bypassing existing batch-based solutions, and a much lighter footprint (and cost base) than traditional enterprise messaging queues.
  • Streaming processing and analytics solutions such as Apache Kafka Streaming, Apache Flume, Apache Storm, and Apache Spark Streaming allow for direct analysis of messages in real time. This analysis can be rule based or involve advanced analytics to extract events or signals from the data. Often, analysis integrates historic data to compare patterns, which is especially vital in recommendation and prediction engines.
  • Alerting platforms such as Graphite or Splunk can trigger business actions to users, such as notifying sales representatives if they’re not meeting their daily sales targets, or integrate these actions into existing processes that may run in enterprise resource planning (ERP) or customer relationship management (CRM) systems.

* * *

As data, analytics, and AI become more embedded in the day-to-day operations at most organizations, it’s clear that a radically different approach to data architecture is necessary to create and grow the data-centric enterprise. Those data and technology leaders who embrace this new approach will better position their companies to be agile, resilient, and competitive for whatever lies ahead.

* * *

Here is a direct link to the complete article.

Antonio Castro and Jorge Machado are partners in McKinsey’s New York office, Matthias Roggendorf is a partner in the Berlin office, and Henning Soller is a partner in the Frankfurt office.

The authors wish to thank Josh Gottlieb, Sameer Kohli, Aziz Shaikh, and Nikhil Srinidhi for their contributions to this article.

o     o     o

Dear Reader:

I need your help.

A number of people have advised me to be much more proactive in “marketing” this blog,

Thus, if you are so inclined, please ask one colleague, or friend, to sign on by clicking here.

Thank you.

Bob Morris

 

 

Posted in

Leave a Comment





This site uses Akismet to reduce spam. Learn how your comment data is processed.