Two years ago, we started adopting data mesh principles. We’d read the book. We’d watched the conference talks. We were convinced this was the right approach for a growing organisation struggling with a bottlenecked central data team. We weren’t wrong about the problem. We were wrong about some of the solutions.
This is an honest accounting of what happened. Not a sales pitch for data mesh, and not a hit piece against it. Just what we experienced.
The Problem We Were Trying to Solve
Our central data team was a bottleneck. Every data request, every new pipeline, every dashboard change went through a team of 12 people. The backlog was measured in months, not weeks. Domain teams were frustrated. The data team was burnt out. Something had to change.
Data mesh promised a path forward: push data ownership to the domains. Let the people closest to the data manage it. Build a platform that makes self-service possible. Treat data as a product.
It sounded clean. The reality was messier.
What Actually Worked
Data product thinking was a genuine win. Before data mesh, data was treated as exhaust from operational systems. Nobody thought about who consumed the data or whether it met their needs. Introducing the concept of data products, with defined schemas, SLAs, and explicit consumers, changed how people thought about data.
We now have data products with named owners, documented contracts, and quality SLAs. When a pipeline breaks, there’s a person to call. When a downstream team needs a new field, there’s a process for requesting it. This sounds basic, but it was a significant improvement over the previous state of “email someone on the data team and hope.”
Self-service infrastructure improved velocity. We invested heavily in a data platform that lets domain teams build and deploy pipelines without filing infrastructure tickets. Templated pipelines, pre-configured compute environments, automated testing. It took about nine months to build, but once it was running, teams could ship new data products in days instead of months.
The key insight was making the right thing the easy thing. The templates baked in our quality standards, monitoring, and governance requirements. Teams didn’t need to think about compliance because the platform handled it.
Federated governance with central standards worked, eventually. We created a small central team that set standards for data quality, security, and documentation. Domain teams were responsible for meeting those standards. This model worked, but it took about six months of arguing about what the standards should be and another six months for teams to actually follow them consistently.
What Didn’t Work
Full domain ownership was unrealistic for most teams. The data mesh literature assumes domain teams have the skills and capacity to own their data. In practice, some of our domains had strong engineering teams that could handle this. Others were small teams of five or six people already stretched thin. Telling a team of business analysts to suddenly own a data pipeline is not a reasonable ask.
We ended up with a hybrid model: large, mature domains owned their data fully. Smaller domains got a “managed service” from a central team that operated their pipelines on their behalf, using the same platform and standards. The purists would say this isn’t data mesh. I’d say it’s pragmatism.
The platform investment was larger than expected. We budgeted for a platform team of four engineers. We needed eight. The platform needed to support multiple programming languages, handle various data storage patterns, provide monitoring and alerting, manage access controls, and offer a developer experience that non-infrastructure engineers could actually use.
If you’re considering data mesh, double your platform team estimate. Then add two more people for documentation and developer support. The platform is the product that enables all other products, and underinvesting in it defeats the entire purpose.
Discoverability remains an unsolved problem. We have a data catalogue. People still can’t find what they need. The catalogue has 400 entries and the search is mediocre. Teams end up asking on Slack “does anyone know where I can find customer churn data?” and getting three different answers pointing to three different datasets.
I’ve yet to see a data catalogue that truly solves discoverability at scale. The best solution we’ve found is designated data stewards in each domain who know what exists and can point people in the right direction. It’s low-tech, but it works better than any tool we’ve tried.
The model we landed on is a pragmatic hybrid rather than pure data mesh:
flowchart TB
subgraph Central["Central Functions"]
Standards["Standards & Governance"]
Platform["Self-Service Platform\nTemplates, compute, monitoring"]
end
subgraph Domains
D1["Large Domain A\nFull ownership"]
D2["Large Domain B\nFull ownership"]
D3["Small Domain C\nManaged service"]
D4["Small Domain D\nManaged service"]
end
Standards -.->|Sets standards| D1
Standards -.->|Sets standards| D2
Platform -->|Tools & templates| D1
Platform -->|Tools & templates| D2
Platform -->|Operates on behalf| D3
Platform -->|Operates on behalf| D4
What We’d Do Differently
Start with the platform, not the org change. We tried to do the organisational restructuring and the platform build simultaneously. This meant teams were being asked to own data products before the tools existed to support them. If I did it again, I’d spend the first six months building the platform, then gradually onboard domains.
Be more selective about which domains go first. We picked four domains to start with. Two were ready. Two weren’t. The two that weren’t ready struggled badly, and their experience poisoned the perception of data mesh across the broader organisation. Start with your strongest domains. Let them succeed visibly. That success story makes the next round of adoption much easier.
Invest more in training. We assumed domain engineers could pick up data engineering skills quickly. Some could. Many couldn’t. The gap between “I can write a Python script” and “I can build a reliable data pipeline with error handling, monitoring, and idempotent processing” is wider than we appreciated. We now run a structured training program that takes about three months.
Set expectations about the timeline. We told leadership this would take 12 months. It took closer to 24. The organisational change, the platform build, the training, the cultural shift, all of it takes longer than you think. This isn’t a quarter-long initiative. It’s a multi-year evolution.
Is It Worth It?
Yes, with caveats. Our data team is no longer the bottleneck. Domain teams can ship data products independently. Data quality has improved because the people closest to the data are now responsible for it. Time to insight has dropped from months to weeks for most use cases.
But we didn’t implement data mesh as described in the book. We took the principles that made sense for our context and adapted them. We kept central governance. We maintained a managed service for smaller domains. We built a platform team that’s larger than the original central data team.
The honest summary: data mesh principles are sound. Data mesh as a prescriptive framework requires significant adaptation to work in most real organisations. Be pragmatic. Take what works. Leave what doesn’t. And plan for it to take twice as long as you think.
For a framework on classifying data products by organisational function rather than technology, see my research on A Practitioner’s Data Product Taxonomy. For how defensive governance and offensive analytics need to coexist, see Defensive and Offensive Data Strategy.