Monday, March 25, 2024

4.6 Master Data Management

4.6 Master Data Management

It’s a basic fact of the modern data ecosystem that critical data about critical entities will be duplicated and most importantly different simply because that data is created in different source systems. This happens in the simple case where a customer interacts with multiple applications, and each application creates its own CUSTOMER record. In the more complex case, two applications are downstream of a third, and copies of source records are sent to the downstream systems where they inevitably get updated or supplemented. Master data management or MDM is the process of creating entities and resynching records with the real-life thing. MDM is in one sense a brute-force solution to the governance problems caused by incompatible data models and inconsistent form validation. In another sense, MDM is the pragmatic connection between a data management ecosystem and the real world. And in a third sense, it’s one of those critical layers we see in good data management practice.

In this section we’ll discuss the process of mastering, which is basic to all pipeline development, and how that process eventually gets turned into the components of an MDM system. We’ll also walk through some of the use-cases, including the specific kinds of entities commonly managed in an MDM system. Finally, we’ll explain how MDMs should be integrated into the rest of their ecosystem.

Monday, March 18, 2024

2.2 Applications change over time

2.2 Applications change over time

We mentioned above that applications evolve. That is, over time an application will change the way it collects data, the kind of data it collects, and the way it stores that data.

Applications will stop evolving for a variety of reasons: Key developers move on, the organizational context becomes fixed or mature, or there’s a plan to replace the application with something new and shiny and no one wants to invest in change to the old application. Stable, happy applications that continue to generate data day-in-and-day-out are the best kind, like the proverbial server absent-mindedly shut up behind drywall that continues to faithfully serve out web pages without a complaint. But this is unfortunately a rarity, and most applications continue to evolve.

Tuesday, March 12, 2024

3.1 Codd's Essential Insight

3.1 Codd's Essential Insight            

            We’re going to expend some considerable energy on this section, because Codd’s insight will change your life. It’s not difficult to implement but the principle is rarely used as pervasively as it should, and as a result much of the work that’s done in data management is wasted when it could be productive. 

Monday, March 4, 2024

3.2 Layers

As part of a section laying out the patterns by which we organize data across the ecosystem functionally, the section below tries to explain the use of layers.

I don't know if it does a good job. I think there's a lot of value in the presentation of layers but I'm really not satisfied with the chapter. The practical and metaphysical points are explained well enough but they don't coincide. In any event, the chapter is presented as is.

3.2 Layers

In this section we’ll expand on Codd’s insight in two directions. First, we’ll put add detail onto his insight with some examples that illuminate the basic intuition behind normalization, which is primarily a practice of local optimization within a data ecosystem. Second, we’ll discuss layers, which is how that philosophical point gets operationalized when organizing local optimizations globally, into a data architecture.

The point of this blog

Welcome to the beginning of the film

In the twenty-plus years I've worked in data management I've spent a lot of time having some version of the following conversati...

Top 3 Posts