4.6 Master Data Management
It’s a basic fact of the modern data ecosystem that critical data about critical entities will be duplicated and most importantly different simply because that data is created in different source systems. This happens in the simple case where a customer interacts with multiple applications, and each application creates its own CUSTOMER record. In the more complex case, two applications are downstream of a third, and copies of source records are sent to the downstream systems where they inevitably get updated or supplemented. Master data management or MDM is the process of creating entities and resynching records with the real-life thing. MDM is in one sense a brute-force solution to the governance problems caused by incompatible data models and inconsistent form validation. In another sense, MDM is the pragmatic connection between a data management ecosystem and the real world. And in a third sense, it’s one of those critical layers we see in good data management practice.
In this section we’ll discuss the process of mastering, which is basic to all pipeline development, and how that process eventually gets turned into the components of an MDM system. We’ll also walk through some of the use-cases, including the specific kinds of entities commonly managed in an MDM system. Finally, we’ll explain how MDMs should be integrated into the rest of their ecosystem.