If you’re reading this you’re aware that data has become a strategic asset, and that data teams are increasingly adopting the Data Mesh Framework to get more value out of their data. The many user journeys on the Data Mesh Learning channel are a testimony to the success and growing popularity of the framework. So far, organisations are making good progress on 4 of 5 pillars of the Data Mesh framework as described in Zhamak’s latest work on this topic:
However, federated computational governance, a form of data governance, seems to be a much harder nut to crack. This shouldn’t come as a surprise knowing that it has always been harder to demonstrate the value of data governance, which typically only starts accruing after a significant investment of time and money. Nevertheless, data governance is an important topic because when left unaddressed, this will become a major bottleneck.
Lately, access management, a particular subset of data governance, is increasingly becoming a topic of conversation among data engineers as data teams are really struggling to scale the data mesh after the initial traction gained from implementing the first 4 pillars. Many data teams we talk to use Terraform to configure access controls for their data products, but Terraform is designed to set up infrastructure, not configure permissions. This is preventing them from implementing access management in line with the philosophy of federated computational governance, which will eventually restrict scale, while at the same time lead to privacy and security vulnerabilities. (insert sad face here)
Let’s discuss the 4 ways that access management through Terraform scripts prevents you from implementing federated computational governance:
The Data Mesh framework requires you to continuously balance federation with centralisation, the former enables innovation while the latter helps maintain trust. However, when you configure permissions with Terraform scripts, access management is heavily skewed towards federation, making it nearly impossible to strike a balance with centralisation. In many cases, there is a lack of central policies, and when they exist, your data engineers don’t know how to find them, let alone include the policies as code in their data products.
The question can even become how federated you really are. The use of Terraform scripts leads to convoluted manual access request workflows centred around those scripts, time consuming access control updates as Terraform has to run through all the configuration files, and hours wasted on building and maintaining connectors. When the amount of data products grows, this leads to undue reliance on the owners of the scripts, making it hard to achieve the scale aspired by adopting a federated approach.
As access controls are dispersed over the Terraform scripts, the central governance team has no means of properly governing access to sensitive personal information, exposing your organisation to significant privacy and security risks.
In the same way that feedback loops and leverage points are desirable for data products, they’re also essential to your organisation’s compliance efforts. If the central governance team has a good view on the organisation’s access controls and data usage, they can detect privacy & security issues with those access controls (feedback points), and resolve those issues by adapting the central policies (leverage points). When using Terraform scripts, the central governance team has no overview whatsoever, hampering feedback loops, and any change in the central policies will have a small and delayed impact on the access controls, taking all the leverage out of the leverage points.
We strongly believe that one of the principles of productive and scalable access management is that you treat your policies as code. Although access controls are configured by Terraform scripts, some of the key features of policy as code are missed. In general, the mandatory review process of any change in permissions is skipped without sign-off by the reviewers, there is very little reuse of access controls, and access controls are neither documented nor tested.
This makes it very difficult to scale, because you become reliant on the few data engineers who know the script. In addition, this also results in significant key-man risk, something their CISO probably isn’t too keen on.
Another important aspect of Data Mesh is the bi-temporality of meta-data, which includes access controls. This means that for each access control you have to store the time on which it has been decided, and the time for which it is valid. In many cases, the Terraform scripts just overwrite your access controls in your data source or BI tool, without keeping track of the valid and decision times, which makes it impossible to keep track of your access controls. In other cases, the Terraform scripts just add permissions. Apart from the lack of bi-temporality, this also leads to excessive permissions for your data analysts and data scientists, which brings about significant security and privacy risks.
We believe that configuring access controls through Terraform scripts are a good way to get started and build traction, but that relying on them will eventually prevent you from scaling your data mesh further. We’re working on a solution to solve this. If you’re a data engineer or data leader who’s struggling with access management, reach out to [email protected] or find us on LinkedIn and Twitter.