A couple of thoughts on how you can manage your portfolio of BI assets using the Semantic Debt framework.
In my last post I talked about calculating the cost of your semantic debt, as the resource cost of bringing the asset into code. That is, for any given report, ETL job/pipeline, table set or Data Warehouse, how much would it cost to modify the asset so it does what people want it to do? This isn't as hard as it sounds, in practice: If you've got say an Attribution report that needs to merge data from two systems but it currently only shows data from one, how much would it cost to get it to merge the two systems? The cost may include building new ETL jobs and tablesets, some open-ended exploration of the source to figure out the natural keys, whatever. But most people can get these estimates with not much friction, often right out of their ticketing system. In fact the better-managed teams I've worked with (I'm looking at you, Shrikant) have their entire portfolio gamed out, often for quarters at a time.
(Even ten years ago asking for estimates like this was difficult. That's partly because with small guerrilla data warehousing teams where everyone does everything, there's so much personality involved estimates can depend on the day of the week. I've been on all three sides of this process: As a one-person team, as a manager and as a requestor, and we're in a golden age of report-development estimation. Modern BI teams run like software engineering teams have taken the measurement fetish from Agile methods and applied it quite usefully to BI development.)
So its not hard to figure out costs. The total portfolio debt is then simply the sum of the individual costs. You can go deep down a rathole here on estimation: Is the new tableset required for the Attribution report part of the final report debt or an independent debt? Etc. Don't worry too much about it: Set some rules and follow them. People with finance backgrounds will want to amortize the cost or spread it out among the final products or maybe even do some arbitrage. I'm not particularly concerned how your local rules work or how much joy you take from accounting, as long as the rules are applied consistently.
Once you've got your portfolio figured out, though, its time to start asking some management questions. Here's one. Take each asset, identify its failure mode, and drop it into the right bucket. So if the asset is in Immediate failure because you know its scope is insufficient for its actual use, then the asset goes into the "Immediate" column. If you discovered the asset was in Sudden failure last week it goes into the "Sudden" column. If you haven't changed an asset for a long time but you know the source system produces far more and different data than you've got in the asset, it goes into the "Gradual" column. And if there's no problems with the asset at all? It goes into the "No Failure" column, along with all the loose dynamite, oily rags, sharp sticks and unvented gasoline containers.
My guess is that most teams will have most of their assets in the "No Failure" column, simply because they haven't thought through failure modes. But suppose you've got a few of these figured out. Your first portfolio is likely to look like the matrix at the top. You've got some assets with various costs in each column.
Why did I caption the image at the top with "Oh No!"? Because there's no program in place to manage that portfolio. Nobody is thinking about what to do with Immediate, Sudden or Gradual failure, and the assets are all over the place. There's no systematic, concerted thought put into dealing with what amounts to the physical plant of the organization's decision-making processes.
Consider that most of the time nobody works on paying down semantic debt they know they've got. Maybe they've got a SiteCatalyst/Analytics implementation that hasn't been updated for a very long time. There's a very large interval between where the asset is and where it needs to be. Is there any plan to pay down the gradual accumulation of semantic debt? What does that paydown look like? You might be able to cut it in half simply by upgrading the version or hiring a contractor, for example. You might be able to focus on Marketing and not worry about Finance, or have to deal with Finance because you've got an IPO coming, or whatever.
But most BI teams highest priority is fixing Sudden failures, because that's the stuff the senior executives notice on a day-to-day basis.
In this scenario the data management team is on top of its Sudden debt: As soon as there's an issue with a report they fix it. Organizations that are fast moving and young will often have a lot of Sudden debt. Many of the newer Unicorn app-driven companies, for example, have what can only charitably be described as "tentative" data models throughout their source systems. And so when new features are added in the collection points - in the applications - these somehow need to be fed into the reporting necessary to run the business. An Attribution report will need to know now if a new partnership is working, even though the BI team may not even be aware the partnership was made. In an org like that there's very little actual Gradual debt on reporting assets, although the source systems themselves, because they lack coherence, have accumulated lots of Gradual debt.
If you're in a situation where your asset portfolio looks like the Sudden Debt Management matrix, then a couple of things become obvious. First if you're always moving assets from No Failure to Sudden you've really just got an analysis problem: Your assets actually belong in one of the other three columns, but you haven't looked at them so you don't know when they're going to fail. If you've got an attribution report that can't accommodate a new partner, for example, then the asset is actually in Immediate failure, you just didn't know it was. Any halfway-experienced data management person would be able to tell you the asset is going to fail if it needs to accommodate a new partner. If it can't accommodate a new partner then its not really a surprise when you get a new partner and you can't do the reporting, is it? You've got a scope failure. You need to go to your management and explain that you can fix your Sudden failures and the lack of confidence they inspire by figuring out which assets belong in Immediate failure.
What happens if you've got a matrix like this?
You've got some of your assets tightly scoped, and some you know to be a problem, and some have gotten out of hand. This is a good spot to be in, other things being equal, if your Immediate debt is all in new assets or concentrated in a particular org, like Marketing. You're working through your assets to systematically identify their utility and the debt that needs to be paid down. To get to that point, of course, there's some organizational machinery you need to put into play: An operating mechanism with the downstream users, for example, so you can identify when things are starting to get out of hand, and a prioritization process that ensures you can keep the stuff that works working before it breaks and gets out of whack.
I want to return to the analogy I made a little earlier, because I'm a big fan of analogies. If an organization ignored its buildings and physical infrastructure - didn't paint the walls or replace the carpet, let the roof start to rot, let broken machinery accumulate on the floor or watched while fewer and fewer of its robots were able to make its products and did nothing about it - we'd think that organization was managed pretty poorly. At the very least they're not treating their employees very well; no one wants to go to work and do their best when the heating doesn't work, the walls are peeling, and nobody thought ahead about buying the right equipment to do the job. More often they'd simply go out of business, either because they can't manage their future or because no one wants to work for them.
We don't spend virtually any time thinking about the debt we've accumulated in our portfolios of semantic assets. We hire people to do jobs and, especially here in Silicon Valley, promise they'll be productive and happy and that we want them to do their best work. And then we studiously ignore the flow of information through the org. We give them reports built by people ten years ago who no longer work at the company. We argue it costs too much to update an ETL job or that doing so will throw a whole bunch of other reports out of whack, the BI equivalent of telling people not to use the stairwell because it might fall down. And we pretend that the incoherent, idiosyncratic, indeterminate or undocumented data sitting on our file systems or our NoSQL dbs are valuable even though they aren't worth much more than newspapers piled on top of the aforementioned loose dynamite, stored next to the break room.
Its time to manage our BI portfolios like they're actually valuable.
No comments:
Post a Comment