Why Do Data Integration Projects Fail?
Discussing data quality, faulty data integration technology, and other factors that contribute to the failure of many data integration projects.
You’ve heard it before. ‘70% of all data integration projects fail.’ But ask yourself this: Why are you surprised?
A simple glance at the Standish CHAOS reports from 2009 to 2015 tell us that only around 30% of any kind of IT projects are considered successful. Why do we imagine that data integration would be any different?
With that in mind, we’re taking a new tack when we look at the failure of data integration projects. Here are the rules:
General IT or corporate dysfunctionality won’t be covered: We all know that poor business analysis, failure to plan ahead, lack of buy-in from the top, procrastination, scope creep, lack of user education, and bad budget estimation can kill any IT project. And now that we’ve mentioned it here, we’ll do our best not to mention it again.
Nothing theoretical: If it hasn’t been seen in IT consultancy, it won’t be mentioned. There’s plenty of room for the theoretical as new technologies emerge, but we’re going to concentrate on real-world situations.
Let’s Start With the Biggest One: Data Quality
Some companies are data hoarders. They keep every scrap of data from the last 20 years and guard it jealously. Because someday it might be useful. In the data integration arena, this is a headache, but in some ways, understandable.
The real problem comes when those same companies try to override the results of the data auditing and data cleaning processes. Suddenly, everything becomes an exception. They try to explain that this clearly duplicate data isn’t really duplicate data because of X, Y, and Z.
The company performing the data quality analysis needs to be firm. Challenges to the process must be carefully vetted and thoughtfully responded to. Attempts to circumvent the process must be brutally stomped out. No exceptions. If you have buy-in from the people who matter, this has to be stressed from the very beginning.
A lot of legacy data is questionable to begin with. Untraceable collection methodologies, uncertain subject consent, and untracked internal manipulation are just a few issues we’ve seen over the years. Relying on the veracity of such data is dangerous. So make sure it gets subjected to the same rigorous auditing standards as everything else.
The moment data quality standards are ignored, the moment certain silos are made exempt from the auditing and cleaning process, you’ve lost. At best, it will be a source of constant confusion and pain. At worst, it will be the poison seed that causes the entire project to fall apart.
Faulty Data Integration Technology
There is no ‘out of the box’ solution to data integration. Anyone who says they have one is selling snake oil.
If the solution isn’t tailored to your particular data sources and the target data environment isn’t specifically and meticulously defined… run. One size fits all does not apply to your data.
What would be considered specific data integration technology would be a platform with a distributed architecture, for example.
In the case of both one-time and continuous integration, each relay is designed to provide specificity. The ability to scale vertically and horizontally means that these relays are still viable even when dealing with a large number of outside applications and resources. They carry their own ACLs, perform specific data translation, have their own journaling and error detection, and integrate with the main AMI system as federated entities. This assures data quality and integrity, end to end, while identity management is respected.
That is the level of tailoring you should expect from your data integration solution. ‘It just works’ is something that is the result of hard work, not magic.
Lack of Systems Integration Testing
Testing is the unsung hero of both the pre and post data integration process. A properly performed integration test looks at every level of the data transformation. Just because your sample data enters at point A and is stored properly in bin Z, that doesn’t mean you’re ready to go.
Integration testing looks at data handling, integrity, and transformation after each step of the process: A to B, B to C, and so on. By necessity, this involves some amount of unit testing and performance testing. Only then can the process’s logic and efficiency at volume be confirmed.
Those strange edge cases that don’t transform correctly, take on incorrect headers, don’t retain permissions, or slow the system to a crawl under certain conditions… they can ruin weeks of work. Unless test data is checked along the way, the failures in logic and performance might not be spotted until it is too late.
Relying on Unqualified Internal Resources
This isn’t to cast aspersions. Simply put, big data integration projects don’t happen every day in most companies. Relying strictly on internal resources is an almost surefire way to fail.
Ego should never get in the way of data integration. Specialists should be retained as needed. Undoubtedly, internal IT and DevOps resources will play a vital role in the process. But oftentimes, they’re too close to the subject matter to be objective. Their evaluation of data quality is bound to be skewed. Their long time loyalties to certain vendors may impact the outcome of the project.
Some avenues of retaining qualified external resources to help with your project include consultancy and data integration, and transformation specialists. Often, the former will hire the latter regardless and take on a managerial role rather than a technical one.
In Conclusion
Choosing the right tools and the right expertise and then actually listening to the experts that you hire goes a long way. So often, it’s the early planning decisions that destroy a data integration project, not the ones made along the way.
Choose your allies and your technologies wisely. Stick to the process. Stay objective. Let the professionals do their jobs. That will give you the best chance for a positive outcome.