Monday, February 26, 2024

2.1 There are a lot of sources

One reason data management is hard is the sheer diversity of sources data management practitioners are expected to rationalize. Consider a typical Customer-360 program. The goal of a Customer-360 program is to collect all of the data in a given enterprise about a customer, organize it into a matrix of some sort, and make it available for eventual consumers throughout the organization. Those sources often include:


Monday, February 19, 2024

4.2 The Data Lake

 4.2 The Data Lake

In the last five years or so Data Lake projects have been sold as a game-changing new type of fauna in the data management ecosystem, a system that is easy- and fast-to-build while providing a lot of short- and long-term value. The Data Lake is not new, however, and solutions of this type have a long and checkered history in data management. They’re also the best way for a beginner Data Management practitioner to start building systems. They’re an educational opportunity to make a series of useful and forgivable mistakes that create a ton of long-term value for their owners. Data Lakes are the critical early stage in the evolution of the data management ecosystem. They provide a laboratory for developing the business logic that glues business processes together, and as such they’re key to all subsequent components of the data management ecosystem. 

In this section we’ll lay out the context in which Data Lakes are built, and discuss some of the strategies used to build them. We’ll also walk through what we expect from a Data Lake, particularly the evolutionary function they play and the knowledge they create for us. Finally, we’ll talk about ways people end up wrecking their Data Lakes.

Monday, February 12, 2024

2.3 It takes time to get the data model right

More from the chapter entitled "Data Management is hard," explaining why people often ignore good data management practices in favor of the easy and expedient.

2.3 It takes time to get the data model right


There are people who find data modeling enjoyable and relaxing. While to all outward appearances the task may involve a lot of yelling and angry erasing of whiteboards, the author (for example) thinks that the process of doing data modeling is almost always exploratory, creative, bounded by the mysterious tyrannies of implementation and comprehension, and exhilarating. We’ve heard stories about data modelers in the old days who took years to reach their end state, and while that doesn’t sound like fun it would certainly be great to do what amounted to professional metaphysics all day. 

But there are many people who don’t enjoy abstraction, or who may believe they enjoy it but aren’t suited for it by training or temperament. It may also be that work on a data model is artificially truncated by philistine business people or technical managers who think the perfect is the enemy of the good and want to get on with good business. In all those sadly-prevalent cases it’s often the case that what you might call “folk” data modeling becomes the norm. The notion of the “folk” data model is worth exploring for a minute, because it’s key to understanding this difficulty. One of the reasons data management is hard is because it takes time - and thought, and effort - to get the data model right.

Friday, February 9, 2024

The Missing Ingredient in Your Data Strategy



A good friend of mine recently consulted on performance measurement for a government ministry that combined the two seemingly unrelated portfolios of Tourism and Economic Development. Their businesses couldn’t have a more divergent customer base. The Tourism portfolio focused on direct to consumer marketing: Families flying in from all over, attracted by the local sights, the beaches, festivals and hotel deals. The Economic Development portfolio was B2B, and worked to attract companies large and small to the area as well as fostering and sponsoring local startups. The metrics used to measure performance across these disparate portfolios were necessarily different, the customers were very different, and the mandates were definitely different. 

Monday, February 5, 2024

4.3 How and why to build an Operational Data Store

Nobody knows how to build an ODS, or why you might build one. I've had many many many arguments with "data warehouse developers" over the years who assume that the Kimball-style Analytics warehouse is the only kind of warehouse facility there is. This mistake results in bad systems design, because the ODS solves a specific set of use-cases that a Kimball-style system just simply can't.

In this chapter, which comes after a discussion of Data Lakes and source systems, I explain how and why to build an ODS.

4.3 How and why to build an Operational Data Store

In this section we’ll first discuss why the Operational Data Store or ODS, also known as an Inmon-style warehouse, is often negatively compared with the Kimball-style warehouse. Then we’ll talk about the use-cases satisfied by the ODS, and the steps for constructing one.

The Operational Data Store is the next obvious logical evolutionary step in data management systems development, after an organization has explored what a Data Lake can do. The ODS is also the most misunderstood system in the evolutionary process, from a development standpoint, and experienced developers are rare. They’ve fallen out of favor in recent years, in part because it requires what appears to be a more skilled data modeler than, say, Star Schema-based data warehouses or “Kimball warehouses,” or the Data Lake. An ODS also appears to fall short in cost-benefit comparisons with classic Kimball warehouses. 

The point of this blog

Welcome to the beginning of the film

In the twenty-plus years I've worked in data management I've spent a lot of time having some version of the following conversati...

Top 3 Posts