- Incompatible schemas or content.
- Incompatible natural keys.
- Under-determined schemas or content.
- Epistemic circularity.
The first three should be pretty familiar to anyone who's tried to migrate data from one system to another. Type 1 debt is common when you need to migrate data from one ERP system to another, for example. I've run into it fairly often with order-management systems, where one version of the ERP system uses a set of order status categories and another version uses a different set of order status categories. The categories are incompatible perhaps because there's simply stages in the new category set that aren't in the old set, or because the order-management system has even rearranged their order.
This used to be a very common problem in the early days of eCommerce, in the 00s. Companies would come to realize that their long-standing and comfortable order-management schemes didn't work with ecommerce. They'd update their ERP systems to accommodate the new style of ordering, which usually involved moving to a newer version of the ERP system. The mechanical change required to manage the higher order volumes of ecommerce often necessitate a change in the way order status is tracked, though, and so suddenly there's these organizations discovered they had a big conversion headache as well.
We see similar shifts today, as we move from the notion of a discrete purchasable product, to a subscription to a more amorphous set of services and privileges. Some parts of the subscription concept can be retrofitted into the old "order form" paradigm but most can't. (Imagine, if you will, a server at a restaurant trying to write down on a single ticket what's involved in a monthly purchase of Adobe's Creative Cloud. You can adapt a lot of behavior by rethinking the concept of SKU, but the difference between boxed product installed on a laptop that anyone can use to on-demand product with variable levels of permissions, with time-limits on access or even JIT checks on access makes that restaurant ticket a nightmare.) As more products and services move to a SAAS model - from "buy it all" to "rent pieces we need" - organizations will need to try to migrate old schemes with discrete stages for discrete products into a much more flexible order status schema.
I'll go into more detail about the other two types of debt in future blog posts. But the fourth is the sexy one. Its also the most complex to describe, although once you understand it you see it everywhere.
I initially called Type 4 debt "Epistemic Circularity," which is a term I picked up from Epistemology. Its not appropriate for most uses because the actual concept of "epistemic circularity" is way to hard to understand. Many philosophers and/or professional epistemologists would venture that because its hard to understand its probably not coherent.
The term I want to use instead is "Epistemic bubble." An "epistemic bubble" is a form of confirmation bias, most commonly seen in politics. In fact if you spend any time online you'll see some variation of the "epistemic bubble" term thrown around. (Its common to call someone's political epistemic bubble "epistemic closure," but the latter is simply an inaccurate use of the term.) The basic idea behind an epistemic bubble is that an what a person can believe to be true is bounded by what they allow can be true. In politics this is often seen in limitations in what you can accept from your political polar-opposites. In US politics, for example, Republicans won't listen to Democrats because what Democrats allow could be true is not allowed in the Republican epistemic bubble. In more specific terms, Democrats might believe that fraud and waste in social welfare programs is a very small percentage of the total amount of money disbursed, whereas Republicans believe its a significant amount. Democrats would as a consequence budget very little money for accountability and compliance, while Republicans would tend to want to spend a lot of time and effort on accountability. In the Republicans epistemic bubble social welfare programs are simply rife with fraud and abuse by their participants; in the Democratic epistemic bubble fraud and abuse are a very small percentage of social welfare programs, and so time and effort are better spent on expanding the reach of those programs. In both cases there's a set of things that could be true, and so we can make sensible statements about them. Where the bubbles don't overlap, however, you can say things about what's true in one bubble that simply don't exist in the other.
Organizations get into epistemic bubbles too. There's an easy formula, from an analytics standpoint, that you can use to determine how much of an epistemic bubble your organization is in:
We can only report on what we can do, and we can't do anything but what we report on.
This may appear unnecessarily cryptic, almost a Zen koan.
Consider: If you want to expand your business, how would you go about it? Suppose you wanted to launch a line of super-fancy SAAS products. ("But we're in the packaged goods business, Dave..." I can hear you say. Volvo is talking about a subscription plan for cars. If a car company can think of cars in terms of subscriptions, you can think of your packaged goods as fungible products.) What changes would need to be made to your current systems to accommodate a subscription product?
For example, would you need a new order-management system? What about your billing cycles? How does Customer Service differentiate levels of service for customers with the old product versus the new ones? Interlaced through all the systems collecting data about your organization are various semantic assets passing data from one system to another, and from reporting systems into the hands of users, whether that's Marketing, Finance or Customer Service. All of those semantic assets and systems need to be changed because the launch of the new SAAS product puts them all into semantic debt. You simply can't convey the information you need, because you don't have the models that can accommodate it. In very simple terms, you don't have the right fields in the originating systems, and you don't have fields in the downstream semantic assets, and you need to run a regression all the way through the ensure that adding anything in at any point in the stream doesn't screw it all up.
Now some degree of "epistemic bubble" is a good thing. We're usually in business to make money, so spending all of our time worrying about all possibilities is a good way to go bankrupt. We thus have to accept a certain set of limits on what our data models and the semantic assets that represent them can encompass.
But where our semantic debt becomes onerous is when we can't actually make the changes we need to make. In machine learning terms, our model has overfit the data: It performs perfectly on the test data we've given it, and terribly on the real-world data we need it to manage. In machine learning the simplest approach for dealing with an overfit model is to back off the training data, either by mixing the set up a little, adding in some incongruous data, or restricting the training set so its not so perfect. This approach only works as a kind of kludge, though. The bigger problem with overfitting is that its a sign our model doesn't really work.
In enterprise data management we can't do quite the same things, but the strategy is often the same. We find ourselves restricting our strategy to fit our data. We have a data model that works, and we look at which business models can be true of that data model, and we choose those so we don't have to make any changes to our data model.
In a future post I'll go into more detail about how we can design data models to be more flexible. This part isn't rocket science, and its a lesson most experienced data modelers have learned. But its generally only applied piece-meal, to individual systems, and not the big picture.
No comments:
Post a Comment