Semantic Debt: 2.4 Most IT VPs are software people, not database people

Monday, January 29, 2024

2.4 Most IT VPs are software people, not database people

This subsection comes as the ending essay of a section entitled Data Management is Hard, explaining all the reasons why data people are the way they are. This section is more sociological and also more critical and controversial. It takes on a management problem.

2.4 Most IT VPs are software people, not database people

It is a strange historical accident that most VPs in the IT industry gain their experience in software development and not data management. One can imagine an alternate universe where this is not so, and organizations go decades producing horrible software until they decide to hire an experienced software architect but in the meantime their databases are all in fifth normal form and none of their data warehouses are crude copies of all the tables in the accounting module. We don’t live in this alternate universe.
That might be a sad comment on the triumph of PR over studious hard work. But it’s more likely due to the simple fact that software teams are often at least three times the size of the database teams they work with, if there is even a database team at all. Moreover, database teams are often said to “support” software teams, which puts DBAs into a role not much higher on the corporate totem pole than customer support or desktop support people. When you’re looking around for someone to run a large team, you look for people who’ve run large teams, and not those who’ve spent their career running small “support” teams. That consideration ends up, by design, selecting for the people who’ve run large software groups for VP positions.
Those VPs are often intuitively hostile to data management, for a variety of reasons. First and foremost, the data management people are usually in the position of saying “no” to the software team, because - to put it charitably - while software people often think in terms of objects, data management people think relationally. The difference is highlighted in questions of cardinality.
If a single entity might have more than one value of a given attribute, a data management practitioner will want to create three tables to store the relationship: A table for instances of the entity, a table for values of the attribute (e.g. say an attribute like RACE, as in the arbitrary American extension “White/Hispanic/Asian/etc,”), and a table to record the instances of cross-reference between the entity and the attribute. (See “The non-folk data model” example in section 2.3.) Advanced data management people will also add a bunch of metadata to those tables, allowing the database to indicate whether a given value of an attribute is the current value or the primary value or is of a certain type such as “Home” or “Work,” when the cross-reference was created and updated last, and whether the cross-reference is even something considered active, or should be ignored for now.
While most software people are now indistinguishably “agile” in their understanding of object-oriented modeling, even before “Agile” software developers were generally loath to think of attributes this way, and preferred to assume 1:1 relationships between every attribute and every entity. What this means, in practice, is objects designed as if every property of the object was a variant of EYE_COLOR (see section 2.3), where there is in fact only one identifiable value per instance of the entity that has the property. Then when more than one value for the attribute is discovered, the developer will simply add a field/property/attribute to both the object and the table, which in the worst-case scenario results in a table with fields called HOME_ADDRESS1 and HOME_ADDRESS2 as large varchar(1000) fields.
This scenario was discussed extensively in the two prior sections. While there we discussed what thinking leads to this mistake, here we discuss why and how that mistake survives. Our subheading consideration is why database people say “no” all the time, and our overarching concern is why IT VPs are software people, generous with their yes’s, and not database people.
These sorts of tables are extremely common. It’s not always the software developer who makes errors like this; there’s a depressingly common practice in databases of prefixing refactored tables with NEW_, so that a decade on new employees can look for data in both CUSTOMER and NEW_CUSTOMER and wonder what people were thinking back in the good old days. It’s the opinion of this author that people who create fields like HOME_ADDRESS1 and HOME_ADDRESS2 or tables named NEW_CUSTOMER are not actually data management practitioners. They’re software developers, either formerly or by preference, and they ought to stick to programming languages that let them do whatever they want. In software development mistakes like this are generally hidden deep in the stack somewhere, found only when some poor offshore developer is tasked with translating the old language into something new. In data management these mistakes are however right out there in the open, and people can curse your name for a very long time.
There is nothing inherent in object-oriented modeling that should lead to such cardinality errors, but it is depressingly common. What’s relevant to this subsection, however, is what happens when a data management practitioner tries to elevate a set of fast-and-loose assumptions about every attribute being single-valued, to the practical consideration that the development team should think ahead, implement appropriately, and exercise some logical skill. Software developers occupy a high status in this culture, one that assumes they think more clearly and logically than the rest of us, even when those talents are as rare as they are in the general population. The data management practitioner who has obvious facts about the future on their side will not have an easier time of it than a business user trying to prioritize a bug. If the developer has decided a property is single-valued, it is often the case that no amount of argument will dissuade them from modeling the world that way. And thus if the data management practitioner stands in the way of development and prevents the implementation of single-valued properties when multi-valued properties are obviously the case, they’ve created a conflict that may create enemies. Who knows better than the developer, after all, how to build software?
Now imagine this same developer becomes the manager of a development team, and is placed in a position of responsibility where they can decide who wins these conflicts. If the data management people are lucky the development manager has come to realize that modeling multi-valued attributes as single-valued is never really something the database can recover from, and they take the side of the DBA. If our people are unlucky the software people have so-far avoided the consequences of this bad practice, and can’t see the future. Sometimes that avoidance just takes the form of just not being aware of how difficult it is for the rest of the organization to make use of the data their applications have created:
Marketing can’t differentiate people who share the same email address or who may be recorded with more than one race (a problem specific to the US that isn’t found in other countries)
Finance can’t determine which products are consistently in backorder
Product Management can’t figure out which features of a product are the most-clicked.
So in the unlucky case, instead of looking to the data model at the root cause of these difficulties, the organization blames “the database” or more often the data management team for not being able to get the data they need to do their jobs. Or equally as unlucky, but perhaps worse, the managing developer comes to the conclusion that these modeling concerns are a matter of taste, and that sometimes they should take the DBAs side, and other times the working developer, depending on who last got the win, or who yells the loudest.
If development managers choose based on the Volume Theory of Truth - who yells the loudest - it would usually be the case that DBAs win. But unfortunately this isn’t how it works out. What has happened instead, over the last twenty years or so, is that the general fear of conflict one sees in the developer talent pool has progressed to the point where data management people are left off the data modeling team entirely. It’s not at all unusual for data architects to be hired into an organization to fix the database literally decades after the thing has been ruined, and then given less than a year to make something out of it. It’s not at all unusual for organizations to hire in data warehouse teams tasked with building reporting systems that can provide meaningful insights, but who fail inevitably because (a) the source systems are impossible to reconcile and (b) the organization isn’t willing to acknowledge that their technical and semantic debt hasn’t just bankrupted them, but left them bereft of usable assets. It’s the rare organization that lets its development teams run roughshod over basic data management principles, and then realizes it needs to refactor source systems and data movement processes, and then actually sticks out the refactoring plan. Most teams are happy to listen to another software executive promise a quick technical fix, or to hire an outside consultant to provide an expensive covering solution that allows everyone to save face, or to just pretend they don’t have a problem. The agreed-upon solution usually ends up being that it’s just difficult to find good reporting people.
The reason all of this happens is because the developer grown into the VP most often has not been faced with actual consequences. Now the last twenty years has in general seen a dramatic loss of accountability among all sorts of senior executives. Accountability is still, both in principle and in practice, something organizations exercise regularly. Consider, for example, that executives who lead bad mergers are often fired. Executives who mishandle procurement portfolios that tie organizations to onerous and unfair contracts are terminated. Executives who fail to deliver promised products are asked to leave. But does anyone look askance at the VP of IT who’s been at the company for a dozen years when the data warehouse project fails? When the multi-million dollar reporting project designed to facilitate the IPO by generating consistent financial numbers runs into difficulty, does anyone wonder who was responsible for the data model that’s proven so useless? It’s highly unlikely. In fact the blame tends to be put on the team tasked with making sense of the model, the data management team. Organizations will skip through generation after generation of these teams, hoping to finally find the one group creative enough to bring order out of what is frankly logical chaos. The blame does not fall on the VP IT who left their developers to model multi-valued properties as single-valued, because of convenience or fear of conflict or conviction that no such things exist.
This is not to say that there aren’t Technology VPs aware of the need to get the database right first, last and always. But look around: Count the number of organizations with full-time DBAs and data architects, and you’ll see the numbers have dropped dramatically in the last twenty years. Count the number of organizations who revel in the use of so-called NoSQL “schema-less” storage systems because they’re confident their developers can do both software engineering and data modeling. And then watch as these organizations follow an arc so consistent it could be a chapter in a Joseph Campbell book: First the product is built and everyone loves it; then they try to scale and can’t; then the Finance and Marketing hotshots start demanding actionable and detailed data; then the original people leave because it isn’t fun having Finance complain when they can’t produce reports; and finally the organization makes peace with its mediocrity, and either fails or limps along with “realistic” dreams.
Again, while this may appear to be just so much polemic, this dynamic plays out on a day-to-day level in virtually every organization that creates data. Data management can fail because software developers don’t like the attitude of the data management people, and those software developers by virtue of cultural cachet and sheer numbers have access to hiring and firing authority, far more than the data management people. And so the concerns of the data management people are ignored or at the very least deprioritized, and somehow the IT VPs slip the noose.
One of the reasons data management is hard, then, is purely sociological. Data management people often work in organizations that devalue their work, for people who don’t understand it. And that is as simple as the fact that most IT VPs are software people, and not database people.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)