IOblend Data Mesh – power to the data people! Analyst engineering made simple
Hello folks, IOblend here. Hope you are all keeping well.
Companies are increasingly leaning towards self-service data authoring. Why, you ask?
It is because the prevailing monolithic data architecture (no matter how advanced) does not condone an easy way to manage the growing data needs of your organization. Centralised data processing and management make it difficult to meet your business demand – you need a sizable engineering team to handle requests and manage the entire ETL cycle. On top of that, the engineering team does not necessarily possess sufficient knowledge of the qualities of the data inputs/outputs beyond the prescribed SLAs – aka they do not “own” the data, but merely process it for the owner teams.
Today’s data lakes, lakehouses and data warehouses still represent a centralised architecture pattern. Although very powerful, they are predominantly deployed to support the “centralised” thinking – a complex ecosystem operated by a team of highly skilled engineers who strenuously try to meet the data demands of the business.
However, in a world of ever-growing data sources, data products and user needs, the monolithic architecture is not proving ideal. As the demand grows, new use cases require different types of transformations and associated management policies, putting an increasingly heavy load on the platform engineering resources. This eventually leads to delivery bottlenecks with unserved frustrated data consumers and over utilised and disillusioned data platform engineering teams.
Below is just one of many examples of the centralised architecture pattern we have encountered on our journey. This organisation was attempting to develop a complex data analytics product to help them improve efficiency and profitability. The project required a large number of distinct “hardened” data pipelines scripted by two dozen skilled engineers and was going to take six months to deliver. Unfortunately, as they costed up the project, it became painfully clear that it was prohibitively expensive to implement using the centralised approach. The costs were outweighing the product benefits, so the management were not prepared to sign it off.
It is highly likely that your organisation is using some form of a centralised architecture. After all, it is the norm. You are working very hard at reducing the bottlenecks by growing/upskilling your engineering teams and buying various tools to help you in the effort.
But what you are essentially doing is applying brute force to overcome the issues of an inherently inefficient architecture – it will never be ideal, no matter how much resource you throw at it. You can have the sleekest infrastructure on the planet, but the centralised approach to data will still be your biggest bottleneck.
You can scale your infrastructure, but you will always struggle to scale your central pipeline.
This is where federated, domain-driven data architecture pattern, known as a Data Mesh, gives the business a better option: robust quality data with business areas handling their own dataflows. Originally coined by Zhamak Dehghani, a ThoughtWorks Director of Emerging Technologies, Data Mesh architecture is now starting to attract a lot of attention. Data Meshes address the shortcomings of the centralised architecture by shifting data ownership to the subject matter experts (aka data domains). Data Meshing unlocks much greater data experimentation and innovation by the businesses while significantly lessening the technical burden on the central data engineering teams.
The benefit is clear: let the subject matter experts (SME) process and manage their own data (to an agreed set of enterprise standards) and supply it to the rest of the data consumers as a “product”. They know their data better than anyone else and are best placed to be the custodians of it. At the same time, they can quickly experiment, scale and augment their data as the business demand evolves. There is no longer a need to load up your central data team with a multitude of requests, no more bottlenecks.
This means that the data domains themselves now do data engineering, management and governance for their data respective domains. The location where the data physically resides is a technical decision and can be central or federated.
The key outcome is that the data SMEs now “own” their data domain. They themselves will write their dataflows, apply standardised governance policies of the business, and manage data quality end to end. There is no longer a central backlog to add your new jobs to and no need to wait for someone else to execute your dataflows.
In a federated architecture, domains will interact with each other directly, easily sharing relevant data, insights and knowledge. This domain networking greatly increases organisational productivity, efficiency and reduces cost.
Back to our earlier example. For the data project to succeed, the company had to think “outside the box”. The only way they could pull it off was by allowing the analysts to work with the data directly and create hardened data pipelines themselves. In a way, this was a form of the Data Mesh.
The analysts knew what data was needed where and when better than any data engineer ever could. All they needed was a capability to do the engineering themselves. Since at the time they still lacked advanced DataOps tools like IOblend, they had to use a few of data engineers to help develop and productionise the pipelines with the conventional apps. Even then, the benefits were impressive: 50% reduction in resource demand, 45% reduction in project timelines and cost, and a much more engaged and collaborating project team.
In the end, this project was made possible by applying a federated approach to solve a complex and expensive data engineering challenge efficiently.
One of the challenges with the Data Mesh pattern is that data scientists and analysts are not engineers. Developing production-grade dataflows is a very specialised skillset. It is not normally possible to train up your analysts to do full-on data engineering, which puts proper Data Meshes in the realm of aspiration at the moment.
You can embed data engineers within the SME domains, but you will again have a single point of execution, just on a narrower scale. What you ideally need is the analysts fully creating and managing dataflows themselves – no middle layers, this is true data democratisation.
That is where IOblend comes in. We have created a powerful DataOps platform that naturally facilitates the creation of a Data Mesh, truly supporting data democratisation IOblend enables the data analyst and scientist to create product grade data pipelines, data assets and governance through automation, without them having to upskill their data engineering skills or rely on data engineers.
IOblend also addresses another major Data Mesh complexity – potential duplication of effort and skills needed to maintain dataflows and infrastructure in each SME domain. Federating data domains can lead to silo-ed teams, if not carefully executed. To avoid this unfortunate outcome, we suggest utilising a central platform that handles the dataflow clusters, storage, and streaming infrastructure for ease of a unified oversight and maintenance. At the same time, dataflow assets should also reside in a central repository so that all domains have access to the prior work and shared knowledge.
An important tenet of a data mesh is collaboration amongst domains to ensure siloed development does not happen. IOblend can help here as well. Our platform automatically creates an easily searchable catalogue of all data assets created, whether that’s a data pipeline, a table, file etc. This encourages reuse and avoids duplication. We avoid silos by having the full observability out of the box.
IOblend gives your data teams a common, domain-agnostic, and automated capability to achieve full data standardization, data lineage, data quality, alerting and logging – all with one platform. These features and the simplicity of implementation and use make it a “must try” product for any organisation considering going the Data Mesh route.
IOblend – make your data estate state-of-the-art
Better airport operations with real-time analytics
Good and bad Welcome to the next issue of our real-time analytics blog. Now that the summer holiday season is upon us, many of us will be using air travel to get to their destinations of choice. This means, we will be going through the airports. As passengers, we have love-hate relationships with airports. Some
The making of a commercial flight
What makes a flight Welcome to the next leg of our airline data blog journey. In this article, we will be looking at what happens behind the scenes to make a single commercial flight, well, take flight. We will again consider how processes and data come together in (somewhat of a) harmony to bring your
Enhance your airline’s analytics with a data mesh
Building a flying program In the last blog, I have covered how airlines plan their route networks using various strategies, data sources and analytical tools. Today, we will be covering how the network plan comes to life. Once the plans are developed, they are handed over to “production”. Putting a network plan into production is
Planning an airline’s route network with deep data insights
What makes an airline Commercial airlines are complex beasts. They comprise of multiple intertwined (and siloed!) functions that make the business work. As passengers, we see a “tip of the iceberg” when we fly. A lot of work goes into making that flight happen, which starts well in advance. Let’s distil the complexity into something
Flying smarter with real-time analytics
Dynamic decisioning We continue exploring the topics of operational analytics (OA) in the aviation industry. Data plays a crucial role in flight performance analytics, operational decisioning and risk management. Real-time data enhances them. The aviation industry uses real-time data for a multitude of operational analytics cases: monitor operational systems, measure wear and tear of equipment,
How Operational Analytics power Ground Handling
The Ground Handling journey – today and tomorrow In today’s blog we are discussing how Operational Analytics (OA) enables the aviation Ground Handling industry to deliver their services to airlines. Aviation is one of the most complex industries out there, so it offers a wealth of examples (plus it’s also close to our hearts). OA