Deciphering the true cost of your data investment
Companies invest heavily in data to improve their decision-making. It’s a massive growth area and brings numerous potential benefits to both revenue and cost side of the business.
Businesses generally measure the benefits via a return on investment (ROI) metric. A positive ROI means that the resources they invest into a data platform are outweighed by its benefits. Ideally tangible ones that can be measured by improvements to the company’s bottom line.
However, if the ROI is negative, should you be investing in a new data platform or system? Or are you better off as you are? If you spend 1mil dollars on a data platform to get a benefit of 500k, the answer is likely no. Right?
Yet, a lot of companies still invest! Not always consciously, mind.
Determining the true ROI of a data project is not always straightforward. On the one hand, it’s not easy to quantify the benefit of having better quality or more timely data in monetary terms. On the cost side (you’d think is a more easily measurable metric), it’s also hard to pin down the figures. The headline costs of buying/building a new data platform or system are often underestimated, causing budget overruns.
Paid and OSS
There has been an increase in industry chatter lately about the sky-high data integration bills and budget overruns. Big name players raised prices for their SaaS offerings, catching many businesses off-guard. In just the past month, we, ourselves, encountered two examples of companies suddenly facing higher costs.
Price increases aren’t surprising, considering the inflation and the need to make better returns. Prices always go up over time. Bur sudden high increases can have a big impact on the budgets.
The bigger surprises are coming from the companies that relied heavily of the open-source software (OSS) to underpin their data platforms. Their expenditures have gone up too. Significantly. One company told us they went with the OSS because it was free and had a broad community support. But a year later, the platform turned into a money-pit. How did that happen?
In their case, the headline estimates from the business case underestimated the integration effort by a mile. They had to onboard ten additional developers. Hiring, training and a high churn drove up their costs and pushed delivery timescales by months. Then, there was a stream of scope changes and new feature requests. These had to be accommodated, but the work was manual. Then, the PM got fed up and left in the middle of the project. It got ugly. Money was flowing out, but the platform was not delivered.
Cutting corners is expensive
There is a lot that needs to be accounted for when scoping any data project. It’s not just how much you pay for a platform/system license, storage and compute. Labour is a big-ticket item that often goes underestimated. Data integration challenges are another area that often bites you.
Then, you need to cover other things like, project scoping and management, architecture, infra, data modelling, platform scaling/decommissioning costs, ongoing maintenance costs, opportunity costs, etc. As much as you can practically cover so it is representative of your “total cost” of the new system/platform ownership.
Companies (including large enterprises) can overlook the total ownership cost (TOC) when deciding to buy/build data systems and platforms. For various reasons (we’ll look at some of them below). Then, they get stung. The original business case ROI goes from positive to negative. People start pointing fingers and looking for scapegoats. Many data projects get scrapped because the perceived (or actual) benefits stop outweighing the cost.
Understanding the TOC
It is imperative to put time upfront into careful planning before committing to a new data system/platform. Understanding the TOC helps to better determine the true ROI and avoids painful unforeseen expenses.
Awareness: Many data teams may not always be fully aware of the concept of TOC or its importance. They focus primarily on upfront costs, such as purchase and subscription fees. But they don’t always consider the broader spectrum of ongoing costs, including maintenance, upgrades, integration, and training. Getting your head around these from the beginning will save you a massive headache later.
Complexity of estimation: Estimating the TOC of data tools can be complex. It requires a deep understanding how the platform will be used, how it will scale, and integrate with existing systems. It’s very important not to just focus on the immediate business needs. This complexity can deter companies from attempting a comprehensive analysis. “Everyone swears by this tech. Let’s not overthink it.”
Try to get outside assistance in cases like these. Get a pair of fresh eyes. Somebody who had experience delivering a similar complexity project.
And, at the very least, have a senior finance person go over your project plan.
Optimism bias: Decision-makers can fall victim to optimism bias, underestimating the time, cost, and challenges associated with implementing new data systems/platforms.
To keep up with industry trends or competitor actions, companies might rush into investments without fully considering their unique contexts and challenges. It’s easy to get swayed by the success stories of companies leveraging new technologies to achieve remarkable results. Unfortunately, stories like these don’t highlight the realities of the implementation challenges.
Indirect costs: Indirect costs, such as those related to data migration, employee training, downtime, and decreased productivity during the transition phase, are frequently overlooked.
One company had a BI tool sitting on the shelf for six months before they could start working with it. They couldn’t get the SMEs to free up time for it due to other business priorities. That part wasn’t considered in planning.
Cultural and organisational factors: In some organisational cultures, there is a strong emphasis on innovation and staying ahead of the curve. It can sometimes result in a rush to acquire the latest technologies without fully considering their utilisation and impact on the resources. Many companies only use a fraction of the capabilities offered by the modern platforms (export to Excel button is still the king).
The other extreme is insistence on building platforms in-house using everything OSS. Data engineers love tinkering, creating new things. That’s their nature. They’ll jump at the challenge. But from a company perspective, you end up with bespoke solutions that demand specialised expertise. They become a major pain when it comes to upgrade/replace them later. Unless you are a tech company, steer away from bespoke as much as possible. A data platform is not the source of your competitive advantage.
Companies often have this misconception that data engineering time is free. Well, we have already paid for them, haven’t we? Their time is by no means a sunk cost. You are spending a fortune on them re-inventing the wheel and solving problems that have readily available solutions on the market at a fraction of the TOC. Every minute they waste on menial things stops them from doing value-add stuff. It’s like chartering a private jet to deliver your pizza.
Vendor pitch and market hype: The overreliance on vendor pitches is a common trap. Vendors naturally highlight the potential benefits and efficiencies of their tools.
They are unlikely to disclose the true complexities, costs, and challenges associated with, say, migration and integration efforts. And not necessarily because they are hiding things. But they would have no idea of your particular circumstance.
Do proper due diligence. Always validate the headline costs against your constraints to derive a better view of the TOC.
Opportunity costs: I often get puzzled looks when asking about the opportunity costs. What will the new platform implementation stop you from doing? There’s a tendency to overlook the opportunity costs associated with investing in data platforms and systems.
For example, a company invests in a complex data analytics tool that eats up all engineering time and budget. It will have little chance to do anything else. The opportunity cost might be the inability to invest in a CRM software that enhances customer satisfaction and loyalty. Or it won’t be able to allocate resources towards R&D for a new product.
Mitigating the TOC impact
It goes against the business nature to waste money. Yet, companies often do, especially when it comes to big data projects. To avoid massive bills and disappointments, costs and benefits must be forecast and measured as accurately as possible.
- Invest effort in upfront planning: The more attention you pay to project scoping and planning, the clearer your project’s true ROI will be. Bring internal and external experts from the business, technical and finance sides. Let them stress test every assumption and consider alternatives.
- Invest in data literacy: Ensure the stakeholders understand the value and costs associated with the data. This understanding will lead to more informed decision-making and optimised investments as the team will talk the same purpose.
- Keep abreast of the latest technologies: There is a reason we have a thriving tech industry: technology is a massive enabler of progress and efficiencies. Dedicate some time to learn and experiment with new tools, systems and platforms.
- Regularly review data initiatives: Continuously evaluate the performance and costs of your data systems and platforms. Adjust your approach as necessary before the TOC spirals out of control.
- Avoid relying solely on a single platform: Build your data estate in a modular manner. Overreliance on a single platform or system can make things difficult when vendor costs go up or community support wanes.
By carefully managing TOC, businesses will avoid making costly wrong decisions when buying or building new data systems or platforms. Remember, it’s cheaper to invest effort upfront than paying for the mistakes later.
At IOblend, we often have to come in and unpick costly past data investments. Mainly in the integration layer, where we naturally excel. But it always pains us to see the aftermath of what should have been straightforward but turned out a nightmare.
If the article rings a familiar bell and you are potentially staring down the rabbit’s hole, get in touch. We can save you a lot of hassle in the integration layer, which tends to be the highest element of the TOC.
IOblend presents a ground-breaking approach to IoT and data integration, revolutionizing the way businesses handle their data. It’s an all-in-one data integration accelerator, boasting real-time, production-grade, managed Apache Spark™ data pipelines that can be set up in mere minutes. This facilitates a massive acceleration in data migration projects, whether from on-prem to cloud or between clouds, thanks to its low code/no code development and automated data management and governance.
IOblend also simplifies the integration of streaming and batch data through Kappa architecture, significantly boosting the efficiency of operational analytics and MLOps. Its system enables the robust and cost-effective delivery of both centralized and federated data architectures, with low latency and massively parallelized data processing, capable of handling over 10 million transactions per second. Additionally, IOblend integrates seamlessly with leading cloud services like Snowflake and Microsoft Azure, underscoring its versatility and broad applicability in various data environments.
At its core, IOblend is an end-to-end enterprise data integration solution built with DataOps capability. It stands out as a versatile ETL product for building and managing data estates with high-grade data flows. The platform powers operational analytics and AI initiatives, drastically reducing the costs and development efforts associated with data projects and data science ventures. It’s engineered to connect to any source, perform in-memory transformations of streaming and batch data, and direct the results to any destination with minimal effort.
IOblend’s use cases are diverse and impactful. It streams live data from factories to automated forecasting models and channels data from IoT sensors to real-time monitoring applications, enabling automated decision-making based on live inputs and historical statistics. Additionally, it handles the movement of production-grade streaming and batch data to and from cloud data warehouses and lakes, powers data exchanges, and feeds applications with data that adheres to complex business rules and governance policies.
The platform comprises two core components: the IOblend Designer and the IOblend Engine. The IOblend Designer is a desktop GUI used for designing, building, and testing data pipeline DAGs, producing metadata that describes the data pipelines. The IOblend Engine, the heart of the system, converts this metadata into Spark streaming jobs executed on any Spark cluster. Available in Developer and Enterprise suites, IOblend supports both local and remote engine operations, catering to a wide range of development and operational needs. It also facilitates collaborative development and pipeline versioning, making it a robust tool for modern data management and analytics