Here is what I find very fascinating lately.
The more data professionals I talk to and the more data integration projects we do, the more I realise just how archaic D&A is in most established organisations. And I mean, wow!
- Key analytical reports living on someone’s PC (not even a power back up!) – talk about single points of failure
- Shared spreadsheets used to enter crucial client data without any quality checks or governance – cells get overwritten regularly
- Paper forms that are then manually entered into a database (full time job, mind) – data gets into the wrong fields
- Data pipelines built for a singular custom purpose and optimised to run on-prem. They are then dumped into a cloud without understanding of the implications (cost and performance are all over the place)
- Businesses are terrified to touch legacy logic. So, they build new rules on top, creating monstrous pipelines that few people understand
- Maintaining multiple redundant systems because the business is not sure of the implications if they switched off the old one
I can go on and on.
What baffles me most is that as the data industry, we are at the top of our game in terms of tools and capabilities. Today, we can solve pretty much any data challenge. We have knowledge, tools and experience to make the transition to the modern ways like never before. Yet, most organisations continue to cling to their antiquated data systems, processes and analytics. Why?
Lost in the data maze
I find it extremely curious. IOblend’s core focus is on data migrations, building new or replacing old pipelines with modern ones (ETL), and synchronising data among multiple systems (on-prem and cloud-based). Majority of our hands-on experience naturally stems from working on those types of projects. But I know the issues span the entirety of the digital transformation landscape.
We encountered mindboggling complexity when upgrading legacy data pipelines. Legacy systems often have highly customised configurations, deeply embedded within an organisation’s operations. These systems were developed over years, decades even. They are tailored to specific business needs and intricately linked with other enterprise processes. The shift to modern architectures means disentangling these connections and re-establishing them in a new, fundamentally different environment. That’s very scary to most data teams.
Legacy systems always contain inconsistencies, data quality issues and undocumented data-handling practices, which lead to challenges when aligning them with the modern cloud-based systems. A straightforward migration job on the surface quickly turns into a sheer nightmare. It’s often simpler to just build another ETL pipeline on top of the existing one. Take the existing feed and iterate from that. So, what business end up with is a spaghetti of pipelines of various vintages and dubious quality, all interdependent on each other. The sprawl keeps growing over time. Sounds familiar?
Legacy ETL to the cloud
One of the most formidable challenges is the migration of legacy ETL processes. The business often doesn’t realise what’s involved. They just want what they consider a “lift and shift” job. Just move it to the cloud. Everyone does it. Shouldn’t take long, right? Well, no.
Cloud architecture is fundamentally different from the on-prem ones. To take a full advantage of the performance and lower operating costs, the business must rebuild their ETL and associated processes to work in the cloud. You must optimise these processes for a new environment that operates on different principles of data storage, computation, and scalability. So no, “lift and shit” won’t cut it. A proper migration is required.
The reluctance to alter business logic
Then, if the thorough rebuild is required, it means getting deep under the skin of the existing pipelines and systems. However, data engineers dread updating the business logic embedded deep within legacy systems. Their fear is rooted in the risk of disrupting established data processing flows, which could lead to data inaccuracies, reporting errors, or even system failures. The latter one is often a sackable offence.
What is very unhelpful is that legacy systems tend to lack clear documentation, especially around the custom modifications. The business users who were involved in the delivery of the said system and associated analytics suites have long since retired. This makes the task of accurately replicating or updating business logic in a new environment painful, to say the least. It’s very easy to open a can of worms. Hence, the engineers steer away.
Migrating ETL takes forever
If you’ve ever been involved in an ETL migration project, you know it always takes longer than you planned it to be. The time required for a complete and fully supported ETL migration depends significantly on the complexity of the existing systems, the volume of data, tools used for implementation, and the specific requirements of the new architecture. Typically, such migrations can take anywhere from several months to well over a year. That’s for a modest migration (single system to the cloud).
One of the projects we witnessed a few years back was an attempted migration from an on-prem system to a modern, cloud-based architecture. But the company could not get itself to rebuild and decommission the core engine, which had been developed a few decades earlier. They tried to splice the new cloud tech on top of it and exactly replicate the legacy logic in the new system. Even when it didn’t make sense to do it. The business just didn’t have the necessary understanding of their own system and feared disruption.
They ran out of money trying to splice together a Frankenstein monster, scrapping (years!) of hard work that had gone into it.
Failure rate is high
The cost of migrating a legacy ETL process to a modern architecture can be substantial. It encompasses not only the direct costs of cloud services and tools. There are also indirect costs involved such as training, potential downtime, and the resources involved in planning and executing the migration. Such migrations often run into hundreds of thousands or millions of pounds, depending on the scale and complexity of the operation. The dev work alone can cost a small fortune.
The challenges cut across all industries in an equal manner. We have seen similar cases in all sorts of organisations. Banking, healthcare, manufacturing, aerospace, retail, telecoms, utilities, you name it. Everywhere they encounter the same issues when undertaking digital transformation.
Gartner estimated over ¾ of all digital transformations fail. The reasons are overrunning budgets and busted timescales. You can see why.
The businesses are thus understandably sceptical when they consider upgrading older systems and processes. Most have been burnt in the past and got scars to prove it.
Fear and lack of incentives
This is why, despite having a treasure trove of sophisticated tools and capabilities at our fingertips, there’s a stubborn hesitance to break free from these antiquated systems. This isn’t merely a case of grappling with technical challenges. No, it’s more deep-seated than that.
It’s a blend of trepidation towards the unknown, apprehension about potential pitfalls, and past failures. Also, surprisingly, there is a lack of full clarity on the benefits that modern technology brings to the table. Yes, modern technology makes our business better off. But why I see my cost line swelling faster than my revenue after we moved to the cloud?
Then there is little incentive for the devs to untangle the web of the business logic, system designs and ancient ETL. They will spend time unpicking the puzzles but won’t get any additional rewards for that. And if they accidentally bring down the system in the process, they face losing their jobs. Let someone else do it.
Light at the end of the tunnel?
So the way I see it, the key here is to minimise the risk, improve incentives and empower people inside the organisations to drive change. Organisations should not shy from bringing in external expertise and tooling to make the journey faster and more cost effective. Do not get hung up on particular technologies and fads. Understand what suits your use case best and stick with that. Do not build for the distant future (it never arrives, btw), over-spec for every conceivable eventuality and complicate the design to the point where it never works. Focus on delivering the value fast today while preserving the flexibility to scale when needed.
I believe, if we can land this message with the businesses, the journey to the modern data analytics will accelerate.
Visit us at IOblend.com for all your data integration and migration needs. We specialise in de-risking and simplifying digital transformations, helping you successfully navigate your data journeys. Drop us a note and let’s chat.
IOblend presents a ground-breaking approach to IoT and data integration, revolutionizing the way businesses handle their data. It’s an all-in-one data integration accelerator, boasting real-time, production-grade, managed Apache Spark™ data pipelines that can be set up in mere minutes. This facilitates a massive acceleration in data migration projects, whether from on-prem to cloud or between clouds, thanks to its low code/no code development and automated data management and governance.
IOblend also simplifies the integration of streaming and batch data through Kappa architecture, significantly boosting the efficiency of operational analytics and MLOps. Its system enables the robust and cost-effective delivery of both centralized and federated data architectures, with low latency and massively parallelized data processing, capable of handling over 10 million transactions per second. Additionally, IOblend integrates seamlessly with leading cloud services like Snowflake and Microsoft Azure, underscoring its versatility and broad applicability in various data environments.
At its core, IOblend is an end-to-end enterprise data integration solution built with DataOps capability. It stands out as a versatile ETL product for building and managing data estates with high-grade data flows. The platform powers operational analytics and AI initiatives, drastically reducing the costs and development efforts associated with data projects and data science ventures. It’s engineered to connect to any source, perform in-memory transformations of streaming and batch data, and direct the results to any destination with minimal effort.
IOblend’s use cases are diverse and impactful. It streams live data from factories to automated forecasting models and channels data from IoT sensors to real-time monitoring applications, enabling automated decision-making based on live inputs and historical statistics. Additionally, it handles the movement of production-grade streaming and batch data to and from cloud data warehouses and lakes, powers data exchanges, and feeds applications with data that adheres to complex business rules and governance policies.
The platform comprises two core components: the IOblend Designer and the IOblend Engine. The IOblend Designer is a desktop GUI used for designing, building, and testing data pipeline DAGs, producing metadata that describes the data pipelines. The IOblend Engine, the heart of the system, converts this metadata into Spark streaming jobs executed on any Spark cluster. Available in Developer and Enterprise suites, IOblend supports both local and remote engine operations, catering to a wide range of development and operational needs. It also facilitates collaborative development and pipeline versioning, making it a robust tool for modern data management and analytics
Advanced data integration solutions: IOblend vs Streamsets
IOblend and Streamsets are both advanced data integration platforms that cater to the growing needs of businesses, especially in real-time analytics use cases
Advanced Data Integration Solutions: IOblend vs Talend
IOblend and Talend, both are prominent data integration solutions, but they differ in various capabilities, functionalities, and user experiences.
Get to the Cloud Faster: Data Migration with IOblend
Data migration projects tend to put the fear of God into senior management. Cost and time and business disruption influence the adoption of the cloud strategies
Data Quality: Garbage Checks In, Your Wallet Checks Out
Data quality refers to accuracy, completeness, validity, consistency, uniqueness, timeliness, and reliability of data.
IOblend: State Management in Real-time Analytics
In real-time analytics, “state” refers to any information that an application remembers over time – i.e. intermediate data required to process data streams.
Data Lineage: A Data Governance Must Have
Data lineage is the backbone of reliable data systems. As businesses transition into data-driven entities, the significance of data lineage cannot be overlooked