Why current workflows don’t scale
It’s day 1 at your new job. You’ve recently decided to leave your previous employer, have engaged with a new one and you are eager to start. Your new boss welcomes you at 9am, presents you with the team and runs through your schedule for the first days. It’s 10am when a colleague guides you through your digital toolbox.
Well… Through his toolbox.
Unfortunately your access has not been completed yet. Your colleague promises he will chase someone and your manager ensures you that he will pull some strings when it takes too long. Another colleague warns you that this can easily take a few days. Your enthusiasm quickly turns into disappointment. If this happens on day 1, what’s more to follow?
Sounds familiar? Some companies have understood that this is one of the greatest disappointments of new-comers and therefore focus on avoiding this. Yet in the meantime similar requests from long-term employees are stuck in the same process. Which is just as frustrating. Current data access management workflows don’t scale well for multiple reasons. I will touch upon a few of them in this blogpost, along with some possible solutions.
A data access management workflow always starts with someone requiring access to some data. Which means someone will raise a request: either the person themselves, or someone on their behalf. But what kind of request?
Asking for access to a tool can be done from within typical workflow management tools like Jira ServiceDesk, ServiceNow, TopDesk, … You can set this up in the beginning, as you have a limited, slowly evolving set of tools in your landscape. The list of data objects however is way too long in most companies to maintain within such a tool. This means a requestor must first find how to phrase what he wants access to without having any access to perform that search.
In order to start the workflow comfortably, a user should be able to identify what he is looking for. Just as service desk applications might provide a complete overview of tools available, a data catalog typically provides a complete overview of data available. To assist during this search, a fully-fledged data catalog including qualitative metadata might turn out to be handy, yet you still require at least an entire data object tree.
Imagine the request has been raised, now someone has to have an opinion about this. Within many companies, the data landscape is seen as something being owned by the data department. This results in a single team needing to make decisions about who can access which data. Even in mid-sized organizations, spanning a few hundred up to a few thousand employees, you cannot expect a single team to know all data objects and all employees. There is not a single person who can assess all these requests.
Throughout the years, data governance initiatives have attempted to solve a part of this problem. By shifting governance left and enabling true data owners, the approval has been placed by people that own, hence actually know, the data which has been requested. Unfortunately this only solves part of the puzzle. In the same mid-sized company, you can not expect every data owner to know every employee within the organisation.
Purpose based access control might be a solution here. In this case, the data owner is responsible for approving the reasons for which someone can access the data he owns, while someone else - possibly managers of the requesters - can approve the access of an employee for those pre-determined reasons. By splitting this workflow, you are certain you obtain valid approvals and ensure that a request does not disappear in the silence of ignorance.
The request has been approved, so now comes the implementation. Even when companies have set up an organisation with data owners approving access requests, granting it is often too technical and too complex. Managing your company's active directory is typically the responsibility of your service desk, writing terraform might be placed at a cloud infra team, or managing access within multiple tools still resides within your data team. Whichever solution you might have in place, all depend on a single team for implementation. Which leaves you in a bottleneck situation. The more requests you have, the fewer of them are resolved within an acceptable time period.
Just as pushing approval left to data owners, pushing implementation left might speed up the process. By allowing data owners to grant access to the data they own, the effort is spread - which decreases the time to a solution. This, however, implies that you need to open up your access management system and lower the technical barrier. Typically you won’t open up your active directory system and you cannot expect data owners to start writing terraform code.
When you know the request, you can automate the implementation. This implementation can range from always introducing new roles, with role explosion as a consequence, up to more complex implementations. From the moment you automate this part of the process, the responsibility truly ends up in the hands of the data owner.
Hurray, you can access the data you have requested. However, this does not imply you will keep the access: in monthly revisions, the data governance team in your organisation might decide to take the access away from you again. They may even be more strict than required. You might have gotten access to a table with certain PII columns, for example, which might end up in taking access to the table away.
The biggest problem with this is that it’s a reactionary process after the facts. If you can reassure the compliance to certain policies up front, you will not end up with this situation. This can be obtained by installing certain governance policies. A good example is that you might want to restrict all access to PII data by default through masking PII columns.
In our example this might lead to a situation where you get granted access to the table but have a few masked columns, which will probably be sufficient. Having defined such governance policies up front, it guarantees a more consistent data access.
Data access request workflows are under pressure in every dimension. They don’t scale along your number of data objects and they don’t scale along with your number of employees. Additionally, your bottleneck of technical experts that implement access and finalise the workflow will never scale at a similar rate.
As discoverability is key for the requestor, and enabling self-service is key for the data owner, a drastic move, a push left of governance, is advised. When data owners take on their role to describe their data products and are able to manage access to their data products, most hick-ups in current workflow are solved. When you can reach this goal in combination with the enablement of a central data governance policy, you will obtain a truly scalable workflow.
This is exactly what Raito is all about. If you want to test out our solution to your long data access request cycle, do not hesitate to reach out.
Photo by Jungwoo Hong on Unsplash