Semantic Debt

Monday, April 15, 2024

4. The data management ecosystem

There are only a few components to the modern data management ecosystem, which is displayed in the diagram below. We’ve plotted these systems on two axes, one showing the latency of the data managed by the system and the other showing the complexity, expressed in terms of the number of sources a particular kind of system organizes. These systems each organize data in a way that’s particularly suited to a class of tasks or workflows. Most of these systems don’t create data, which generally happens in the bottom left hand corner in the Services/Applications space. Instead, they rearrange and organize data. Application and service databases serve as the supplier of raw materials for the information supply chain, and they’re usually designated in this work as the “source systems.” Some systems manufacture data as a byproduct of their basic operation, such as Master Data Management systems that function as curation tools for a small percentage of the data they manage. But for the most part the job of the data management system, when it isn’t a source system, is synthesis and organization.

3.6 Patterns of Organization

3.6 Patterns of organization

There are a couple of simple patterns you should consider when organizing your data management efforts for maximum impact. In this section we’ll discuss some of those patterns of organization. We’ll look at a couple of different organizational models, and point to some fundamental properties of those models that help make teams effective.

4.6 Master Data Management

It’s a basic fact of the modern data ecosystem that critical data about critical entities will be duplicated and most importantly different simply because that data is created in different source systems. This happens in the simple case where a customer interacts with multiple applications, and each application creates its own CUSTOMER record. In the more complex case, two applications are downstream of a third, and copies of source records are sent to the downstream systems where they inevitably get updated or supplemented. Master data management or MDM is the process of creating entities and resynching records with the real-life thing. MDM is in one sense a brute-force solution to the governance problems caused by incompatible data models and inconsistent form validation. In another sense, MDM is the pragmatic connection between a data management ecosystem and the real world. And in a third sense, it’s one of those critical layers we see in good data management practice.

In this section we’ll discuss the process of mastering, which is basic to all pipeline development, and how that process eventually gets turned into the components of an MDM system. We’ll also walk through some of the use-cases, including the specific kinds of entities commonly managed in an MDM system. Finally, we’ll explain how MDMs should be integrated into the rest of their ecosystem.

2.2 Applications change over time

We mentioned above that applications evolve. That is, over time an application will change the way it collects data, the kind of data it collects, and the way it stores that data.

Applications will stop evolving for a variety of reasons: Key developers move on, the organizational context becomes fixed or mature, or there’s a plan to replace the application with something new and shiny and no one wants to invest in change to the old application. Stable, happy applications that continue to generate data day-in-and-day-out are the best kind, like the proverbial server absent-mindedly shut up behind drywall that continues to faithfully serve out web pages without a complaint. But this is unfortunately a rarity, and most applications continue to evolve.

3.1 Codd's Essential Insight

We’re going to expend some considerable energy on this section, because Codd’s insight will change your life. It’s not difficult to implement but the principle is rarely used as pervasively as it should, and as a result much of the work that’s done in data management is wasted when it could be productive.

3.2 Layers

As part of a section laying out the patterns by which we organize data across the ecosystem functionally, the section below tries to explain the use of layers.

I don't know if it does a good job. I think there's a lot of value in the presentation of layers but I'm really not satisfied with the chapter. The practical and metaphysical points are explained well enough but they don't coincide. In any event, the chapter is presented as is.

3.2 Layers

In this section we’ll expand on Codd’s insight in two directions. First, we’ll put add detail onto his insight with some examples that illuminate the basic intuition behind normalization, which is primarily a practice of local optimization within a data ecosystem. Second, we’ll discuss layers, which is how that philosophical point gets operationalized when organizing local optimizations globally, into a data architecture.

2.1 There are a lot of sources

One reason data management is hard is the sheer diversity of sources data management practitioners are expected to rationalize. Consider a typical Customer-360 program. The goal of a Customer-360 program is to collect all of the data in a given enterprise about a customer, organize it into a matrix of some sort, and make it available for eventual consumers throughout the organization. Those sources often include:

Semantic Debt

Monday, April 15, 2024

4. The data management ecosystem

4. The data management ecosystem

Monday, April 8, 2024

3.6 Patterns of Organization

Monday, March 25, 2024

4.6 Master Data Management

4.6 Master Data Management

Monday, March 18, 2024

2.2 Applications change over time

Tuesday, March 12, 2024

3.1 Codd's Essential Insight

3.1 Codd's Essential Insight

Monday, March 4, 2024

3.2 Layers

3.2 Layers

Monday, February 26, 2024

2.1 There are a lot of sources

The point of this blog

Welcome to the beginning of the film

Top 3 Posts

Report Abuse