Data Mesh approaches using the example of Planning

July 27, 2023 Richard Leitner

Data Mesh approaches by Zhamak Dehghani are in many aspects a turning point in dealing with analytical data. Of course, it is more oriented towards innovative, data-oriented, large corporations, whose data volume and consumption is so extensive that all previous approaches from the BI and Big Data world are not sufficiently solution-oriented and often fail due to concrete and operational feasibility. And it must also be added that Data Mesh is not a one-off invention, but has evolved over time from project experience and common sense in dealing with analytical applications.

As mentioned in the title, it is about decentralized data architectures and thus it contradicts the idea of enterprise DWH. However, the target audience should be considered, because decentralized architectures have always been necessary for an international corporation with a hundred DWHs. Big Data approaches have come to fruition, but the concept of the data lake as the only substitute for this has not been able to gain acceptance.

For a smaller company, some approaches to data mesh sound overdone, though the idea that data is seen not just as an asset, but as part of the product, can apply anywhere. And this idea is outstanding. Outstanding especially if you’ve been around analytic data architectures for a while and you see the evolution to Agile, to DevOps, to Product Teams, and to Data Mesh for analytic applications as a series that complements and builds on each other.

For me, the best example of how far-reaching the approaches of Data Mesh are can be seen in a planning environment, for which I was responsible for nearly a decade. When my client established product teams as part of the DevOps rollout and rolled them out across the company, we were the pilot project. A large group of applications consisting of various planning systems was divided into two functional products. With DevOps, the discrepancy between operations and development was gradually resolved, simply because operational activities were no longer perceived as a disruption.

This involved an almost purely analytical system landscape, consisting of many individual systems, which either prepared data in DWHs in an elaborate manner or carried out the widest variety of types of planning at company level. The main task was to break down sales and financial figures into markets and products and to carry out medium- and long-term planning for the group. In addition to this strongly quantity-oriented planning, there was also more milestone-oriented product planning, highly integrated with the former. The value of these planning activities became evident when they were missing or when the planning results were considered untrustworthy. They are essential for corporate management.

Even then, the introduction of product teams meant that we as the BI business unit, which had taken over the operation, maintenance and design of the landscape, including the technical platform, were also given more responsibility for results. Of course, the operational work, i.e. operating and developing the system, was not completely different after the introduction of products, but the responsibility for the result of the planning and not only for the running of the system has increasingly grown.

The work on the planning landscape was characterized by the fact that there was extensive technical debt and maximum heterogeneity of technologies. The infrastructure was at its limits and, due to the high security requirements, a completely independent platform had to be operated, which in turn led to scaling problems. And in addition to this, the entire product team was under enormous pressure to implement new features, because corporate planning in a corporate environment is not usually granted a postponement.

We had always kept to the principle of a domain-agnostic self-serve platform, as described in Data Mesh, in mind, also for the reason that the planning landscape is a stand-alone platform. The platform is highly integrated in itself and thus duplication of code and data is avoided. However, it must be considered that modernization measures in particular have led to components being duplicated and operated in parallel as old and new systems for many years. The product manufacturers of DWH, planning and reporting could hardly be expected to provide effective support; the requirements were too specific.

At that time, as often it is still today, analytical applications were seen as data sinks. The idea of integration, or generally the idea that results and intermediate steps in the aggregation of data should be shared, is not self-evident for product manufacturers. Overall, the entire planning solution including the platform has lost its connection to the BI product over time, and the integration of the planning landscape internally and externally was a challenge.

This inevitably led to the discussion of federated computational governance, which is not easy for my chosen example because a separate platform was chosen due to security requirements. The platform and the two products on the platform are independent and decoupled and therefore had their own defined policy.

The central aspect of Data Mesh that I value most is to consider data as part of the product, not only as an “Asset”. My chosen example of enterprise planning is a forerunner in this regard, which arises from the nature of corporate planning. Whereas compared to an EDWH or a departmental DWH, which serve many different uses, a corporate planning solution has just one purpose which is to deliver, various planning results. Thus, domain ownership is much more evident and there are no attempts by any another department to do enterprise planning as well. All generic approaches like Data Lakes have hardly any place in this environment and are not considered. What is innovative about Data Mesh is, above all, the architectural approach that the product team is responsible for its own data, from the operational system to the analytical data, but prepares the data for itself in a domain-specific way.

This inevitably leads me to a discussion about Cognitive Load, because it is one of the biggest challenges for Data Scientists and Data Engineers; on the one hand to bring in the extensive methodological and technical knowledge and on the other hand to understand and interpret data. It is the nature of data that it can have multiple meanings, which adds to the Cognitive Load in such a way that a team member may or may not be familiar with a domain. All approaches of abstraction and modeling are supportive, but precise and accurate data knowledge is indispensable for decision making. The mindset in Data Mesh addresses this challenge, supporting the product team to work with more generalists. Similarly, our project staffing was also characterized by having a core team that could bring in domain expertise. In addition, the team was supplemented by generalists, mostly for technical assignments.

Data Mesh approaches are state of the art for analytical scenarios, both in solution definition and in team direction and focusing. Data Mesh is based on principles that must be adapted to the company, business model, architecture and project situation. It is more about establishing in an organization to apply the principles.