How to increase the success rate of business data projects

Mon, 15th Aug 2022

FYI, this story is more than a year old

By Simon Burgoyne, strategic account executive, Talend

Amid changing economic conditions and uncertainties about supply chains and staff availability, it's never been more important for New Zealand organisations to be innovative.

In a post-COVID world, many of the strategies that used to work now need to be redesigned or replaced. Returning to 'business as usual' is not an option.

One of the key drivers of innovation is data. Data allows for informed decision-making and effective strategic planning.

Yet, frustratingly, experience shows us that the vast majority of business data initiatives are doomed to fail and most never actually make it to production. Those that do are often too slow, clunky, and unreliable that they don't provide a return on the investment made in them or promised to the sponsors.

One of the key reasons for this very high failure rate is the siloed systems and databases that exist within many organisations, often with duplicated, untrustable data or simply just not visible to the staff who can understand and interpret it. In most cases, there tends to be a vast chasm between the staff who explore the data and the production teams that implement the initiatives based on that data.

This disconnect erodes confidence in the value of data, in the principles of sharing it, and in the very structures that support the business. Steps need to be taken to bridge this divide and create a culture of data literacy and maturity across the entire organisation.

Destined to fail from the outset

For many business data initiatives, the question is not why they failed but rather why they were undertaken in the first place.

Take the example of a data scientist who develops a model that will solve the problem of real-time customer recommendations. The scientist tests their model by running a Python script on a laptop, and everything works perfectly.

However, things start to fall apart when the data engineer tries to implement the same functionality using complex pipeline technology with Spark and Scala languages. It turns out that the algorithm isn't as fast, robust, or secure enough to handle the entire customer dataset in real-life conditions.

The reality is that in production situations, there will always be edge cases, regulation challenges, resource limits, and other factors that complicate analysis. What worked beautifully in a test environment often fails completely when run in the real world.

Many data projects are doomed from the outset because the people who plan them and the people who execute them don't have the same tools, the same access, or even the same goals. Data scientists tend to be really good at asking the right questions; however, most don't know how to scale. At the same time, data engineers are experts at making data pipelines that scale, but many don't know how to find the needed insights.

Much of this challenge stems from the fact that many businesses are using data tools that require such a high level of specialist expertise that it's impossible for everyone involved to use them. Because data scientists only ever touch small subsets of the data, there's no way for them to extrapolate their models to function at scale.

Meanwhile, data engineers are being handed algorithms to implement with the barest context of the business problem they're trying to solve and why the data scientists have taken their chosen approach. There might be some discussion between the two groups, but usually not enough to overcome the challenges.

Four steps to overcome the data challenge

It's clear that data is only going to become even more valuable for businesses, and hence it is vital to find a way to ensure projects succeed. Four key steps that need to be taken are:

1. Improve data access:

Not allowing all those involved in a data project to have access to the data is a recipe for disaster. Choosing a platform or a technology that is restrictive in terms of pricing, user evolution, or simply access to the organisation's data makes it impossible for business users and data scientists to scope solutions that work at scale. Data must be freely available to the people who need it while maintaining end-to-end control and compliance, without restrictions on volume, data sources, or users.

2. Create a more level playing field:

Unfortunately, many of the people who plan data projects don't understand the constraints posed by production environments. This is because they typically don't have access to the specialist tools that data engineers use to build pipelines. It's important to have a consistent set of user-friendly, self-service data approaches within a unified platform to nurture a common language across the organisation.

3. Put everything into a wider context:

Those within an organisation who use data every day need to enrich that data with context and commentary. This will help other users understand what data they can trust and how to use it best. In addition to rating (trust) and commenting, users should have visibility into data's provenance and lineage.

4. Maintain security and governance:

There is a significant risk for many organisations that is posed by unregulated data. There needs to be a balance between the need for innovation and how security and governance is maintained. Establishing the roles, rules and permissions that ensure accountability will require data governance at every stage of the data lifecycle.

By taking these steps, organisations can significantly increase the chances of future data projects being a success. Today, enterprises need to establish an agile data environment where everyone achieves the data literacy needed to share a data culture, and everyone can participate in building data trust, driving organisation-wide data health. And when data across the organisation is healthy, it's easy to operate business objectives with data, increase levels of innovation, and support planning that can guide future development and growth.

Share on: