Cookies
Close Cookie Preference Manager
Cookie Settings
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage and assist in our marketing efforts. More info
Strictly Necessary (Always Active)
Cookies required to enable basic website functionality.
Made by Flinch 77
Oops! Something went wrong while submitting the form.

Data contracts

Data contracts are here to stay. One can no longer think of software development without APIs and the same will become true for data engineering and data contracts. But as we are still very early in the rise of data contracts, solutions to assist data engineers will pop-up and disappear again..

Data contracts are the talk of the town. Numerous webinars and blogs have been dedicated to the topic, but its definition is still murky. Partially due to a largely semantic discussion, partially due to it being a fairly new concept. A more in-depth thesis about data contracts can be found here. Vendors are moving into the space from multiple angels and claim to be the ultimate data contract solution. When the smoke clears, the industry will come to a consensus on the definition of data contracts, at the risk of becoming solution based. 

The smoke will clear to reveal the true definition of data contracts

Photo by Filip Bunkens on Unsplash

Let us therefore be very clear. Raito is not a data contract solution. A data contract is an up-front agreement between a data producer and a data consumer, governing the format of the dataset created by the data producer. Depending on the tool, a user can merely monitor, or even enforce compliance with the data contracts. Raito offers a data access management solution, governing data usage by the data consumer. In this capability, Raito offers data access management and usage analytics, combined with basic data discoverability. In short, Raito governs the consumer interactions with data objects subject to the data contract, in contrast to data contract tools which govern the behavior of data contract producers.

Raito and data contract tools

Image by author

Data access management and data contracts

Yet still, you feel data access management is closely related to data contracts. A data contract might have multiple endpoints that a data consumer can use to access the data covered by the contract. The end point can expose masked data or unmasked data, a subset of the data or the whole data set, all serving different purposes.

The concept of endpoints implies explicitly that access policies are not in scope of a data contract, even when an access policy can affect or even define how data is consumed through the endpoint through column masking, row filtering, or other policies. In fact, global access policies overlay these endpoints orthogonally and might conflict with the locally defined endpoints.

Global access policies overlay locally defined endpoints orthogonally

Image by author

Where endpoints are defined locally at the technology level as a handover point between the data producer and consumer, access policies can be defined globally at the logical level which can affect the behavior of multiple endpoints at once. For instance, if you have an access policy that says that employees can only access data of customers within their geographical region, that policy will affect any endpoint exposing customer data, whether it is an API, Data Warehouse or BI report.

The Data owner also owns the data access controls 

If not owned by the data contract, who owns data access management? A data contract is an agreement between a data producer and a data consumer. In modern data management the concept of a business data steward or a business data owner is becoming standard practice. Additionally, you see that within data mesh data ownership is being pushed to the data producer. Hence, when a data contract is an agreement between the data producer and the data consumer, the data producer becomes the data owner, who is responsible for managing access to the data.

When applying this to data contracts, this makes a lot of sense: it’s the one that provides the goods of the contract, that should hand out the key as well. A data producer should be enabled to decide who can access which version of his data. Even when their endpoints are influenced by global policies, the data owner should be able to determine which users can interact with their data and in which manner. More on this here.

A data owner is still in control to offer unmasked access to authorized users, even though regular users obtain masked access by default

Image by author

The different ways that global policies defined by the central data governance team can interact with a data producer’s end points can become quite complex. For example, you could have a central policy that requires that you always mask customer data to employees from other geographic regions, independent of the end point used to access that data. Yet still, the data producer should be able to provide access to all the unmasked data for valid purposes. Understanding the impact of, and resolving this interplay between policies and their exemptions can become a very time consuming activity affecting your time to market.

Amongst the many other benefits, it is this complexity that Raito resolves. If you want to know more, reach out to us for a free trial!