Data Security for Data Products

‍

What is a Data Product?

A Data Product is a self-contained and curated combination of the underlying physical data, technical meta-data, its semantic meaning , and the output ports that can be used to consume the data product. Modern data teams use this packaging to improve the user experience and trustworthiness of the data to better serve their counterparts in the business. Or as Sanjeev Mohan puts it:

‍

“Put simply, a data product conveys trust and the product features meant to solve business problems. A data product has measurable value. It has an owner who is responsible for delivering value throughout the product’s lifecycle from design to retirement.”

‍

Data Products represent a fundamental change in how data teams extract value from data. Instead of treating data as a passive resource, they view it as a product that has to be actively managed and continuously improved. By applying DevOps best practices, data teams deliver data-driven insights in a much faster way. However, data access and data security are still managed using outdated processes resulting in fault lines that create productivity costs and security blind spots. We’ll address those fault lines here and how Raito helps solve them.

‍

A lack of abstraction

Data Products provide a layer of abstraction from the underlying data storage that improves usability and trustworthiness. You can find them in Data Product Catalogs where you can also find their owner, the semantic meaning and data quality SLA’s. It improves usability as data consumers are shielded from the intricate technical details that go into provisioning and managing the data, and lowering the threshold for business users to access data products. However, we’re missing this abstraction layer for access controls, which is causing problematic DevEx:

‍

Users are expected to know which permissions they need to access a data product, which proves to be very challenging. The fact that these permissions often have highly technical meaningless names, doesn’t make it any easier.
Data Products can consist of tables, folders, kafka topics, ML models, and so forth. Each technology comes with a different way of managing access which frustrates data engineers and creates security blind spots for the security team.
Cloud data providers such as Databricks and BigQuery come with very low-level access controls amounting to a lot of tedious work to manage permissions to data products. Definitely, when those data products consist of multiple tables.

‍

Raito lets you decouple access management from the underlying technologies by providing you a simple solution to orchestrate access management and monitor data access and usage for data products across cloud data providers. With Raito users can request access to Data Products after which Raito updates the permissions to underlying data sets conditional upon prior approval by the Data Owner. All this without exposing the technicalities of the underlying access controls.

‍

Rigid user groups

One of the biggest challenges with current data access management processes is their reliance on group management in Active Directory (AD), or any other Identity Provider. The result is that with each new data product, access request or any other change in access the data engineers have to log a ticket with the security team to update the security groups. This process typically takes very long, creating huge opportunity and productivity costs.

Raito lets you create your own logical user groups using users and groups from AD, decoupling the data development processes from group management in the AD. Combined with the fact that this approach lets you more easily federate data access management to the data owners, data teams experience significant efficiency improvements in their data access management processes.

‍

Ownership is missing

While Data Owners are responsible for managing access to their data products, it is important to keep in mind that the data governance team still remains responsible for compliance with privacy and security regulations. Therefore, they need to keep tight control over the data masking and row filtering rules and have sole ownership over these controls. Unfortunately, it is very hard to manage data security ownership with the current access management technology.

Raito makes it easy to assign owners to access and data security controls, making it possible to federate data access management to data owners while letting data governance provide for guardrails against privacy and security breaches through data masking and row filters.

‍

Stuck at the right

When asking a data engineer at one of our customers why he likes Raito, he said:

‍

“Data Governance set policies for data security based on data classification, but until Raito we didn’t have the tools to enforce those policies”

‍

We’re experiencing a massive disconnect between Data Governance and the Data Development Processes. Consequently, access can only be granted after the Data Product has been committed to production, leading to time-consuming and error prone work that frustrates the data engineer. In order to keep with the pace of today’s data development, Data Security should be integrated in the Data Product’s Development Processes.

‍

Raito lets data engineers manage data security as code in their CI/CD in two ways:

Data Engineers can define which roles/groups can access a data product and which data masking or row level filters have to be applied in the same code base as their data products.
Raito can dynamically manage access to a data product and apply the appropriate data masking rules and row filters based on how the data engineer has tagged the Data Product.

‍

To summarise

As demand for data increases, fault lines in the data security workflows appear. Traditional access management architecture is built for managing access to applications, not to data products. This used to work fine, but as the number of data sources, users, and use cases have grown exponentially, a reliance on outdated data security workflows and brittle workarounds produce considerable productivity costs and blind spots. There is a sense of urgency in organizations that want to drive innovation and increase their competitive advantage. The current approach to data security management is leaving data teams constrained and unable to deliver at the speed expected by the business.

Reach out to bart@raito.io or plan some time with me to learn more about how Raito can help you streamline data security management for data products.

Talk to the team

Data Security for Data Products

What is a Data Product?

A lack of abstraction

Rigid user groups

Ownership is missing

Stuck at the right

To summarise

Product

Partners

Solutions

Legal