Cloud Native Live: Simplifying microservice authorization | Cerbos

Published by Alex Olivier on March 07, 2024
Cloud Native Live: Simplifying microservice authorization | Cerbos

In an engaging session on Cloud Native Live, Cerbos' CPO and co-founder, Alex Olivier, alongside CNCF's representative, Taylor Thomas, delved into the complexities of microservice authorization in modern tech stacks. The episode, "Heterogeneous Microservice Authorization - Less Scary Than It Sounds," aimed to demystify the challenges and present a streamlined approach to managing application permissions across diverse frameworks, languages, and deployment models.

Read on for a summary of the episode, as well as a full transcript.

Main takeaways

The session highlighted the intricacies of authorization in an environment characterized by a variety of programming languages, frameworks, and deployment strategies. Alex outlined the current state of microservice architectures, where individual services, potentially written in different languages, interact within a Kubernetes cluster or at the edge, each requiring distinct authorization logic.

The primary focus was on the deployment of a scalable authorization solution that centralizes the authorization logic, moving it away from individual codebases into a highly available service. This approach not only simplifies the management of permissions but also enhances the security and maintainability of the application stack.

Centralizing authorization logic

A key point discussed was the concept of decoupling authorization logic from microservices and centralizing it into a single, highly available service. This strategy enables developers to define roles and permissions once and enforce them across all services, regardless of the programming language or framework used. It ensures consistency in authorization decisions and significantly reduces the complexity and fragility associated with managing permissions.

Implementing scalable authorization

The demo showcased an architecture consisting of Node, Go, and Python-based services, illustrating the process of extracting and centralizing authorization logic. Alex walked through the implementation of Cerbos, demonstrating how to decouple the authorization logic from the microservices and enforce permissions uniformly.

The approach leverages Cerbos policies, written in YAML, allowing for easy management and understanding of authorization rules by both developers and non-developers alike. This flexibility is crucial for organizations with complex authorization requirements that evolve over time.

Benefits of decoupled authorization

The discussion emphasized the advantages of this decoupled approach, including:

Scalability: By centralizing authorization logic, it becomes easier to manage and scale the system as new services are added or existing ones are modified.

Flexibility: Changes to authorization rules can be made in one place, without needing to update individual service codebases.

Clarity and accessibility: Using YAML for policy definition makes the authorization logic accessible to a broader audience, including non-technical stakeholders, enhancing transparency and collaboration.

Conclusion

The live demo by Cerbos at Cloud Native Live, featuring Alex Olivier and Taylor Thomas, provided invaluable insights into managing microservice authorization in a cloud-native ecosystem. By adopting a centralized, decoupled approach to authorization logic, developers can achieve a more scalable, flexible, and maintainable system. Cerbos offers a powerful solution to streamline this process, supporting a wide range of programming languages and deployment environments.

For a deeper dive into the specifics of implementing scalable authorization solutions with Cerbos, visit Cerbos.dev and explore the resources available.

Transcript

Taylor: Hi, all. Welcome to Cloud Native Live today for Wednesday, March 6th. I'm Taylor Thomas, and I'll be the host today. At Cloud Native Live, we take the opportunity to bring in multiple people and technologies from the Cloud Native space and give them the chance to show how Cloud Native works and how you can use it in your setups and in your problem solving that you have to do.

As a normal reminder here, this is a cloud native or a CNCF sponsored event. And as such, it is governed by the CNCF code of conduct, which boils down to be nice and respectful people. So please do not put anything in the comments or, or any or in any other form that would be disrespectful to other people.

So with that said, I'm going to go ahead and hand things off to Alex, who is from Surboss, and he's going to talk a lot about a bunch of really interesting things around authorization and authentication and all those kinds of. Things we hit. So Alex, I'll let you go ahead and introduce yourself and go ahead and share everything.

Alex: Great. Thanks Taylor. Hey everyone. I'm Alex Olivier. I'm one of the founders of the company behind the open source project Cerbos And we are in the authorization space which I will go into much more detail on but today we're really going to be talking about the nitty gritty of what it takes to actually implement a scalable authorization solution inside of your environment be it cloud native or otherwise for how you can define roles and permissions actually enforce those in all your different microservices So just to kick it off, just make sure we're all on the same page.

So unfortunately, authentication and authorization are two words that sound extremely similar, especially when they're reduced down to auth N and auth Z, as you may have seen. And so just to make sure we're all aligned authentication in the side of the system is the bit you're doing inside of your application.

A request comes in, you bounce a user off to go and authenticate where they have to present some sort of credential. So username, password, a single sign on type type workflow, a login with Google, login with GitHub, that those kinds of flows are all used to, and then you get back some sort of verified credential.

You now know who that user is. And that flow is sometimes very much coupled as well with kind of your user directory information. So this person will have a role or a group or a scope and these kinds of typical ideas around identity and, and. Or to be kind of the active management piece of thing authorization is kind of the next step So once you know who someone is, you know, what group they have they belong to, you know What team they have know what role they have these kind of information Is how do you actually then go down to your application level and actually authorize a particular request?

So can this user with this role? Call this particular method or call this particular API endpoint. And just to be clear We're talking about the application level here So you're probably you're used to R back inside of your kubernetes cluster which controls access to resources and namespaces those kind of things So we're now talking about the next level up So actually the applications you build and deploy in those clusters are in all your different environments They also need a level of kind of permissioning.

So you're building a If you look at a typical serverless user, you're building an application. It's made up a number of micro services You're running it in kubernetes. You've got some sort of gateway in front of it You've got some end user application and be it a web app or mobile app or something that's hitting the api making requests And now in the application inside of your web of micro services be it in a service mesh or not You need to actually decide whether particular requests are should be allowed or should be authorized or not and Where we're going to be looking at today is how to kind of scalably manage that in an environment particularly when you've got a disparate web of microservices and those like those services can be different languages and frameworks and binaries and it's a very kind of heterogeneous environment where you need to kind of scalably go and actually implement and define this logic across the application architecture So to kick things off don't worry.

I'm going to go into code now. I'm not going to show you any more any more content I have a, my machine here, a, a small communities cluster running using minikube if you're interested. And basically the architecture of this is we have the end users, they make requests to our cluster.

Inside of our cluster, we're running Istio. So we've got an envoy. Proxy running and behind that service is we have behind that sorry that gateway we have three services We have a report service. We have an expensive service and a user service And what is kind of typical of kind of transformation projects or kind of evolving application?

architectures is each of these service So this can actually be in different languages. So you might have one in Python, one in JavaScript, one in Go, one in Ruby, you might have some NET thing, et cetera, et cetera. And you know, because we're now in a nice world of containers and, and communities to orchestrate that it's very easy to actually spin up these services and actually have them stand alone and, and kind of manage everything themselves.

And it's kind of irrelevant what language they're all written in, if they all just expose a REST API or gRPC API, for example. And that's kind of exactly what we've got here. So just to give you a kind of a quick idea if we go and look at our like, python service each of these services Basically exposes a rest api.

This one is has hd services in this demo Has basically one endpoint where we can go and request some some resource So basically it's a this one is a typical fast api application. We got some data You make a request. We then go and look up that particular request by ID from our dummy database and then return that as a JSON.

So if I actually go and hit kind of that endpoint in my dummy application here, I've actually got a reports and I'm getting response back. So this request is coming into my cluster. Istio is managing where to route that request through and it's ultimately hitting a pod that's running in this case My python application and returning that data.

And then the same thing for my expenses tracking application except this one is in in node so I have a express application has some endpoint where you give it some id it goes and queries the database for for the record by that id and either returns a 404 or returns the actual expense Data so here's I can now Funnily enough do that same call and it runs we get that kind of response back And the third one we have is our user service, which is in this case in go we have our kind of like dummy database here, which is just a map for now and then here we're using gorilla as the Http kind of server We make some, we've defined some endpoints and again, we have this endpoint where you give it an ID.

And in this case, it's like our user database. And this is kind of a very common service. You might have an application where you want all your other services to go and fetch data about your users. We're going to look up a user profile, for example, as a common service that your other services call.

That's exactly what we've got set up here. So in this case, we're going to query our database by the ID. And return, that record back so at the moment these services have no kind of authorization in them So we are kind of like using dummy authentication where we're just passing in this header of a user id in reality you'd want to actually have this as a Jot token or some sort of actually verified credential in this dummy application Which is passing as a header and I at the moment I can go and actually look up Different records.

So here i'm looking at expenses i'm making this request as an admin and I can get the response back for this particular record here That's owned by sally And if I go and look at again, I've got an expense id number two This is again querying from our node service And this one's actually owned by martin and i'm actually now going to make this request as a different user And here I can actually get a result back for a record that I don't own and and this is where Typically you'd start writing some authorization kind of logic.

So if we're going to kind of define what's our Basic authorization requirements and let's go and like actually go and implement some basic checks on our application So we have three different resource types and two different user roles in this setup So we have admin and we have a user so your identity provider ultimately will hold The role that user has in your system.

Are they an admin or are they a user? And then we basically need to apply this this matrix of permissions to decide whether an action should be allowed or denied So in our expenses service, we have some actions create, read, update, delete. We're just gonna focus on read from this example, and we're saying, okay, to read a record, to read a record if you're an a, if you had the role of admin should be allowed.

But if you have the role of user, you're only allowed to either read that record if you own that record, for example. And we have the same kind of logic for our reports as well. So that's, as you're gonna implement this, let's go and do kind of the, the simple check inside of our application. So we have our request.

The request comes in. We have our a middleware in here that does kind of does our authentication. So we can go and request the user's profile, which basically gives us back that user ID that could be extracted from your JWT, for example. And now we want to do some permission checks. So where you may sometimes feel like start is like, okay, well, that's going to find some check where it says like, okay, Allowed by default should be false and now let's go and implement that logic.

So in our request We have some middleware which is doing this authentication and we're going to go and get the principal So who's made the request and we're going to go and actually write this logic which says okay. There's this an array called roles And we're going to see whether the roles of that principle contain admin And if that user and admin our business rules said that that should be allowed it's like great You So our second rule says, okay, if the user isn't, if the request or the principal and the request is a user, they should only be allowed if they are the owner of the resource.

So this is where an example where you kind of migrate from role based access control to attribute based access control, RBAC to ABAC. So now in this case, okay, we need to do the same check as before. Let's go check the roles. Let's see if, make sure this person is a, this time user. But we also wanna check that rule that says, is this person the owner of the resource?

So now we have to do another check in here. So now, now we know this person is a user. We wanna go and check whether the ID of the person making the request. So we call it the principle ID is equal to the owner of this expense resource that we've already created out the database. And if they are, the action should be allowed.

So now this is where we've done a check for if our user is an admin, if our user is a user. And then what we want to say is, okay, we don't want to return any results if the action is not allowed. So in this case, we're just going to do something very basic. And we'll say, uh, let's go and return uh, an error message saying like unauthorized.

And we can set the status as well to like a

403.

Alex: Mm hmm. And you would then go and, Basically go into that. Sorry 401. 403 basically go and set our response and kind of return that and that way our now our apis is ultimately protected Based on that business where we have where we're checking the roles that the person makes as a request For it.

So that's a node that we're happy. We've got logic implemented. We've hard coded all this and we're returning, a relevant error, And this is would be our kind of first use case of setting these basic rules So now we also want to have the same business logic for our report service So I would then have to now go into my python application Go to my request handler and basically re implement the same logic, to support this particular use case So in the world of python, we are going to go and say, okay, we got the request we got we got sorry, we've got the request.

We've kind of got, in this case, the report from the database. Now we need to go and do, we basically re implement the same logic. So we're going to create that boolean again. So this time we're going to go isAdmin in the principal roles. So it's doing the same logic as in our node application. So we're now saying if you're an admin, great.

Now let's do the same logic if you're just a user. So now this time we're saying you're a user end of roles, but we also need to do that additional check we said where we want to check if the owner, so now we need to do, okay, if the report owner id attribute is equal to the principal id, Then the action should be allowed and then if it's not allowed exact same kind of format as before in this case.

I'm just going to go and copy my snippet save all the typing we then go and return in this case. We return an exception. So now we say okay The logical check says the action should be allowed So now we've got our logic set up, our business logic set up as it should do in our Python application.

And then, okay, now we need to want to do the same thing in our Go application. And I'll spare you kind of the copy and pasting, but we end up basically implementing the same thing. This logic uh, so we set up these variables we go and we then go and fetch our Logic, and then if the action is not allowed then we go and return some sort of error So now we're going to have successfully implemented our logic based on business requirements and our api is now up to date and serving and our Application sort of deployed and we can go and now make kind of requests against these different endpoints And funnily enough, if I now go and make a request as before, where I'm pretending to be user Sally, trying to request a resource invite Martin, I'm going to go and get that 401 unauthorized response with some sort of error message returned.

So it's not great. It's amazing. We've kind of hit our requirements and we can go and live happy lives and never have to think about it again. Reality as you build applications as you evolve application services and you're asking your business requirements change And what commonly what happened is like, ah, okay.

We now want to introduce a more fine grained permissioning model So now we want to go and add this concept like a manager role So we need a more fine grained role. We can no longer just divide up people between admin and users We also now want to have this kind of higher We want to have this more granular role of like a manager.

So now for a manager Our logic now says, okay, if the user's an admin, should always be allowed if a user is a, has the user role, they should only be allowed if they are the owner for reading. But if they're a manager, we say, okay, we're not, they don't have to be owner, but they have to only be able to access at the resource that's in the same region that they are a manager of.

So all of our expenses and all of our reports in our kind of database has a field called region, which dictates which region that that particular resource belongs to. And then each user that we have also um, has a region associated. And we can actually go and look at those user roles here. So here are different users.

We have an admin. They have the role of admin and user. They have attributes, name and department of region. Sally here has the role of user. Again, name, department, region. And then our resources also have a similar model where we have a region associated with it. So I've now kind of got my new requirements.

I need to now need to go back into all my application code Again, three different services three different languages. You know, I I may know enough JavaScript to go and implement the node one I might know just enough python to go and hack my way through the python implementation And i'm just going to be blind guessing for go because that's just not my day to day job or it's not my my first language I'd say So we basically what this is going to take time you're going to have to do more testing You're going to have to go and redeploy all the application services to update logic, etc and hopefully you can now kind of get an idea of what pain can might be involved is as these requirements keep changing that you have a new feature You have a new product you have a new user role.

You want to enable more access and more fine grained access You're going to have to go and keep touching this application code over and over and

over

Alex: So there is a better approach here, which is You Implementing what we call decoupled authorization in kind of the term that's used for these days So I still want to have my single entry point So my user is authenticated a request comes in hits my my service mesh And that request then gets routed to the right application.

But now rather than having each application Hard coded with this business logic. Let's actually decouple that logic out into a standalone service So this in a kind of formal specification term is referred to as a policy decision point and a policy decision point You Takes a request of here's this user or principal trying to do this particular action on this particular resource It evaluates that request against some policy and then simply returns back the application and allow tonight decision So let's go and do kind of exactly that.

So here I have my server's policies. This is a again just yaml definitions. Hopefully We are talking cncf kubernetes world here. Everyone's happy with Well, okay. Everyone knows YAML, they're saying maybe not everyone's necessarily happy with YAML. But you're all familiar with it. We've all got tooling, we've got IDEs that give you nice autocomplete, etc, etc.

So this is a, the service policy that's going to sit behind our expense for our expenses system. And we have one for our reports, we have one for our user service as well. And each of these policies defines the different resource types in our, in our application. And for each of those resource types, we define the different actions.

So here we have create, read, update, delete, read as kind of the ones we've been talking about, and we're going to focus on this kind of read example. And then for each of these actions, we have different rules that define when those actions should be allowed or denied. So for example, here we have a read rule that says the action should be allowed if the user has a role of user, and then this expression or this condition must also be true So this replicates that same logic we were doing in our application code, but in a much more agnostic Way in a declarative way as as these policy files rather than having to code our logic But we still essentially do the same thing.

We're looking at the request We're looking at the resource We're looking at the owner id and we're just seeing whether that id is equal to the id of the person The person making the request and for this new manager role. We have this new manager role defined and then we're doing now we're doing that check we're talking about where we're checking with the region is equal to the Resource is equal to the region of the the manager making that particular request.

So here we're defining this logic centrally Same thing for the report same thing for our users and now we need to actually kind of kind of go enforce this So our first step is actually going to deploy these policies into our environment So i'm using helm here and to deploy into our application so I have a deployment i'm deploying a new service into our cluster and this cluster this this deployment then deploys the server's container and the server's then container has into it loaded in those policies You So that's now all deploying we can see down here that our service container is starting up.

It's great. It's all up and running we could go and actually go and get the logs from that So that's now up and running the policies are actually already loaded into it in this case It's kind of baked the policies into the container itself And we do recommend much github style approach so you can actually point servers to Load those policies from a github repo you can or any git repo and you can point to like a storage bucket cloud storage bucket or even to a database For this example, we're just loading it from disk.

But ultimately these policies are now loaded into our application Into our sorry decision point. So this this policy decision point is now up and running in our cluster Now we need to actually go and update all our application services to point to it So if you start off with node before where we had all this if else hard coded logic, we can get rid of all this We will then go and connect up our policy at decision point.

So here we're using the service SDK and the way the service SDK works is you basically put it into your application and you point it to an endpoint. So you've deployed the service container inside of your cluster. It's all local. It's all environment. You completely air gap it even, you know, it's all open source as well.

So you can go and fork the repo if you want to have your own copy of it, et cetera. But we're basically pointing it, pointing our SDK to where that service is running, and in this case, we're just passing an environment variable, which points to in this point, in this case, serverless. demo. cluster.

service. local. So it's all running inside the environment and we're actually using the gRPC SDK in this case, and then in our application code before where we were doing all this if else star logic. We are going to replace that with a call to servos using the sdk So all this disappears and we replace it with this this call So here we we've got our expenses object kind of from a database and we've got our our principal which we which is the same module we were checking before and then rather than hard coding all that fls ourselves We then just call out to the the servos sdk and there's a method called is allowed You We pass in the principal, we pass in the resource, and we pass in what action must be required.

This is now implement, this is now in our code, and what we get back is a simple binary, well, boolean. Is this action allowed or not? So this method, behind the scenes, the SDK is making a request to that serverless instance inside of the cluster. It's saying, here's my principal, here's my resource, here's my action.

That decision point, that serverless container running alongside the application, is going to make a decision based purely on those inputs. So serverless itself is completely stateless. The only thing in there is policy and this kind of decoupled approach is very scalable because you can then just horizontally scalable horizontally scale your decision points and with the case of serverless we recommend running it actually as a sidecast There's one inside of every pod.

So you're not going to do any off node hops, etc but that request goes over principal resource action. What comes back is a boolean Allow or deny And so our same logic as before which was simply if it's not allowed return an error if not returns all sort of Return the actual data it still works. So that's now set in the world of python funnily enough.

We do the exact same thing so i'm going to take out All the logic we hard coded

And we've already pulled in the sdk as before we have this variable we have this variable and then we do the same call out. So here we're calling our our servos sdk We're doing the exact same thing where we pass through the principal and the resource we're currently accessing and then if the response is allowed then we Do that if the result is not allowed then we're raising exception and if it is allowed then we are going to go and And just return kind of that data and funnily enough and the exact same thing works in our go application so We are going to take out All the logic we did before, um, I'm going to just go and enable the serverless SDK, which I will show you what that looks like, but we have this basically create client method, um, where we go and grab the configuration again from the environment variable, we create a new connection to that serverless decision point and then in our application code

We then go and do the API call out to go and check the permissions passing in that principal resource and action. And then what comes back again is that same Boolean. So that's actually now all up and deployed and running. And our application is authorizing requests as they come through. So if I go and make the same call, we're actually getting that access denied type logic again.

But now if I go and pretend to be a manager, Yeah. To do for the correct region. So here I've changed my identity, my principal to a manager. And our business rules that says if the manager is from the same region as the resource and the action should be allowed, that's now implemented. This response now comes back successfully.

What's if I go back to being a user, let's say we're getting the correct access to my message. So what this all now means is all our authorization logic is now abstracted out. So we can actually then go And completely go and evolve all of our authorization logic simply in these policies So from now on when we get that new requirement from users or business or such Where we want to add a new role tweak the logic, etc I no longer have to touch all my individual application services I don't have to go remember how to write javascript I don't have to remember how to go and write partner.

I don't remember how to write go you know, I can only keep so much in my head and three languages is more than enough. I don't have to worry about that once the sdk implementation is done once the service is running inside of the cluster All the only place where I have to go and touch my authorization logic is inside of these these policy files so I can go and change this logic I can make that save make that change The and then the container will pick up that logic And all my application codes that are calling out to the serverless containers or the serverless instances inside the cluster Will start serving based on that new authorization logic with no extra work required That is the core of authorization and decoupled authorization at that in these heterogeneous environments that we seem to be building more and more these days Decoupling all this out into a standalone system and into a standalone service into a standalone policy language.

It just opens up Way more flexibility and scalable scalability in terms of how you architect and build systems. I'm sure many of you are going through these kind of big still migrations, maybe monoliths into microservices. One of the, one of the first things you have to worry about when you start breaking big monoliths down into separate services is how you handle common kind of functionalities like authentication and authorization logging metrics, all these sort of things.

Cerbos is a solution for the authorization piece. And it is a completely open source. We are like a very proud CNCF member we've built all the tooling, again all in the open You can find out more about this at Cerbos. dev hit us up on github github. com slash Cerbos we're, we're all open for a contribution and collaboration, and hopefully this has given you a bit of a flavor and a bit of an idea of kind of how decoupled authorization works.

You can find out a lot more on our website and up in our repos and also documentation, we have things like full CICD integrations, you can write unit testing, it's authorization policy which is something which is quite hard to do if everything's hard coded into your application by abstracting out, it gives you all these nice benefits to test and isolation and such.

And with that I would like to say thank you for your time please go and check us out and hit us up on twitter. com we have a very active slack community. And and and we always love to Speak to you You can see right on our home page We have the speak to an engineer button you can actually book it and get some time with myself or one of the other team Where we can actually help you get up and running asap, with your application and with your policies And we can do this actually through an interactive IDE we have available just at play.

service. dev so you can test things right in the browser Without having to spin up a whole cluster as well

Taylor: Awesome. That was great, Alex. So for those on the stream, if you have any questions, please feel free to submit those and I'll make sure those are, are displayed and sent over to Alex here. But I also wanted to ask just a couple of questions that, that came up that I think might be interesting for a lot of people.

So you mentioned. About there's different ways of managing policy. So what tools do you have to manage like your different policies? And then, like, I know you'd mentioned get ops and databases. But when you see your users doing most often, what do they use to manage those? Because that's a common problem as these things continue to expand.

And as you get more complex logic,

Alex: absolutely. So We are big fans of the whole GitOps approach as I'm sure many, many of you are as well. Be able to actually write your policy statically and commit those into a repository, which are versionable auditable manageable more than anything else. It is kind of what we see.

Now we, we have a very pluggable storage backend, but the GI get repo ultimately is the one that we see the most usage of because of all the usual benefits we get in terms of, of GitHubs, et cetera. And as I mentioned earlier, they kind of, we, we've also released, again, as part of the open source project, the c, the CI tooling as well.

So as well as defining your actual policies, you then define. Fixtures for your principles and your resources and then you can define a full test suite of what your expected permission should be and then through your in your existing ci pipelines, you just run the servos Servos test command ultimately point it to where your policies and tests are and then it will Evaluate everything, that you've defined and give you kind of the usual Test output if things aren't as they should or give you an all clear to then move on with your environment We using very much like a branch based or tag based approach as you're probably already using for how you actually deploy your application It's also what's recommended So when you deploy a serverless container if you use the open source project You can actually point it directly to your repo and say okay pull in this tag or this branch we also have serverless hub Which is our managed control plane that sits on top of the open source project Which allows you to do much more of a managed ci style environment that coordinates rollouts as well

Taylor: Awesome, that's great.

We also have a question from youtube. I'll show that on the screen, too and it's how does this integrate with jwt access tokens?

Alex: Great question. Great question. So Cerbos is completely agnostic to any particular authentication provider Hopefully you kind of saw in some of the those code snippets that basically it's down to the application to pass the identity through where you will Send from your application the particular identity Now, given that most authentication providers today are basically issuing JSON Web Tokens, JWTs JOTS, pick your pronunciation of choice Cerbos natively supports those.

So when you deploy a container you can also point it to your JSON keyset. J, J, J, I don't, J, W, K, S, that one, again, I don't know what the standardised pronunciation is these days for that. But you point it to your keyset and then you basically have two options. Inside of your serverless decision point, you can say, just, just validate the token and pass the data into the policy, or you can actually set it to verify the token as well.

So it will actually, using your keyset validate the, the actual token before passing it into the application, and you can actually set your serverless Reject the request if the token is invalid or just log a warning. And but what this now means is any data that's inside of your token, be it scopes or any other kind of data you're setting inside of your token from your identity provider, can then be used inside of your authorization policy.

So checking a user's scope value or maybe checking some custom attribute based on maybe what department that user belongs to, all that information that It's generally inside of your idp system You can then just pass directly into Cerbos without having to do any kind of data manipulation You can just pass the full token through and have Cerbos actually consume that token and make it available So what that now means inside of your policies?

Is you can then check any fields and such inside of the the token and and get very specific and reuse your existing maybe your Authentication, tooling also in Authentication mechanisms with authorization with Cerbos site

Taylor: Thank you. One more question from youtube as well And that's how can you write if or can you write complex policies with code?

Alex: So one of the kind of earlier design decisions we made with Cerbos because we've been working this project for about three years now is actually if you really look at where the requirements for authorization sit It's not, it's not you as the developer. It's not you as the implementer. It's going to be with inside of a company and he's going to be with like the business mind on the business side of things.

So it'd be at the product to manage the product owner. We've seen examples where actually it's like the sales team room, the support team kind of dictate and control permissions to users, accounts, and those sorts of things. So what we decided to serve us early on is we didn't want to have it as code. In the sense of.

Go, JavaScript, etc. Because we wanted something a bit more kind of agnostic and maybe it's a bit more readable to maybe someone that isn't a day in day out dev so we landed on yaml just because the there's a very rich ecosystem of tooling around it There's nice visualization tools. And we do things like publish the schemas for those.

So you can light up your ID to give you full IntelliSense, also complete style things feedback as well. And what we've actually seen through the adoption of the project over the years is actually, we get feedback like, Oh, I'm, I'm the product manager. I'm the CISO. And for the first time, I've actually understood what the authorization logic is in my application, because all I'm looking at is these policy files.

I don't have to go and understand Java or Go or, in other language to actually really understand how things are implemented and because We believe it's better for the person that actually has the holds requirements to really understand what's going on And and using yaml, which has a really good episode of rich ecosystem tooling around it Kind of it's a bit a bit of a leveler for everyone and and you know We have plans as part of our commercial offering on top and to make that much easier to use even further but the approach of decoupled in something that's agnostic You You know, this the schema is very simple.

Those conditions i was showing you like checking the owner id etc are Hopefully very readable to you. They're using cell common expression language, which again is a google open source project It's used inside of kubernetes as well And so if you're you're familiar with that you can get up and running very very quickly but yeah, the whole philosophy is even the kindest complex policies should be in something that's readable without necessarily understanding or knowing a particular programming language.

Taylor: Great. Thank you So another question from from my side after watching all the things you're showing is is a question around is it possible to extend the policy engine? So can you extend it with some other like either plugins or whatever to be able to do something more complex? This is kind of related to the code kind of thing, but on a on a different level So is there support for that or is that more of like no we consider that kind of out of scope

Alex: Yeah, so we we we're trying to keep The design of server is as simple as possible.

And not because, you know, we're being lazy but because when you really get down to authorization, this is in the blocking path of every single request to your application. So you need the authorizations to decisions to be as instantaneous as possible. And this is why we actually made the very early decision to make.

The serverless policy decision point stateless. sO at no point you would do did I mention anything about having to synchronize information about your users or your roles or your resources into some sort of authorization service? The only thing that's loaded into those serverless containers is the actual policy files.

All the other contexts is then provided request time. Yeah. The, the downsides of maybe having a system having to synchronize state, which we go with that first. Is you now end up with like eventual consistency cash misses latency type issues of getting updates about resources or users to your authorization layer That's going to actually result in people trying to make calls to your api and getting incorrect authorization results because the data hasn't been refreshed yet this the second thing around this as well is There are what you also don't want to do is because your policies can change based on your business rules, independently of your application code.

If you were to have your authorization layer kind of going out and fetching state, like doing some lookups plugins that way what you could end up doing is what may seem like a small change in your authorization logic could actually result in. Unexpected load in other parts of your system because now it's have to go out and fetch other state from from maybe a Service or application or database that isn't, you know ready for that load or isn't ready for that scale and I come from a background of building high throughput low latency talking like 50 odd billion requests a day type type systems And i've made every mistake possible at that kind of scale and you know One of those was trying to push state or do like arbitrary lookups and things at request time for every single request and That's when you run into all sorts of these issues.

So Going back to authorization being in a blocking path, every design decision remains so far with serverlosses with that in mind, and to make sure the lookups are as instantaneous as possible. That's why we say like run as a sidecar, use gRPC API rather than the HTTP one you know, keeping the state outside and passing in a request time, always use the latest data.

Etc keeps everything as performant as possible while still getting all the benefits of this decoupled approach.

Okay.

Taylor: Thank you the other question I have around this, this topic is When when we've seen people use this before and I know like from my personal experience working with people implementing these kind of things There's always a need for audit logs or or policy decision logs What kind of support do you need?

Servos have for that and what does it look like to enable or to consume?

Alex: Absolutely. So I hadn't touched much on the kind of the decision point side in this, in this talk, but one of the big, big advantages is in this example, we had three services. We had the one in Python, one in Go, one in Node all doing permission checks ultimately.

And if each service was doing its own checks itself, Each service is going to have its own implementation of how it does logs. So you might just log out a JSON line, one might just do a cleartext, one might just be no logging at all even. And if you were to then try to understand all the decisions and all the access controls and such that were happening inside of your application in the kind of that model, you're then going to have to go and Basically fetch logs from multiple different services and try and get 'em into a similar format, et cetera, et cetera.

One of the big advantages of all decoupled authorization approaches, not just CBOs, is there is that there is a there, there is a central service where all these decisions are being made. So all again, all these requests are going through that central service. And, you know, horizon scales, central service, and thus every request.

It can be captured and every decision made by a server's decision point produces an audit log. And that audit log is a, it's a JSON structure and it's timestamp. So at this time, this principle, this ID, these attributes is trying to do this particular action on this particular resource with attributes, et cetera.

And the resultant decision was either allow or deny. That allow or deny was matched by this particular policy for this particular commit hash. And that is a standardized log. So regardless of which service is doing permission checks, where in your architecture that permission check is being done from, that log will always be generated before a decision is returned back to the application.

Now those audit logs are, you can configure where they're sent. So by default, servers will just log them to standard out, and then you'll generally plug in your existing log collection system. So if you've got FluentD or something running inside of your cluster, you can pick up those logs and send them anywhere.

If you're running like a cloud infrastructure, you might have cloud logging or cloud watch and those kind of things that will pick up those logs quite happily and but you can also configure it so you can run into a file. And we actually had a a big contribution from one of our community members that wrote a kafka sync for it So actually take those audit logs put them onto a kafka And we'll point to a kafka broker and then those are now in a queue.

Which you can version etc, etc. But it's a pluggable backend and it's extremely flexible You know by default that standard out gives you extreme flexibility to use any tooling to push it into anywhere Be a self hosted like low key type thing or a commercial product data dog those sort of things it's extremely flexible.

But the key thing here is Every decision that goes through a decision point gets that audit log that's in a stable format in a consistent format and always guaranteed to be produced in the same way for every decision made inside of the stack.

Taylor: Great. Thank you. I also saw some comments come in from, from Twitter about how audit logs are important.

I think you really tackle those things of being able to ship it wherever you want to hot tiers, cold tiers, like whatever you need to do. It exposes it for that, so that's great. Another one I think is a practical thing, given how many languages are used across different companies. What languages do you have SDK support for?

Alex: Yeah, so the core of Serpos is written in Go. That's the open source project. Go and have a poke around if you so wish. And the interface that exposes itself if you don't want to use an SDK and the server's decision point exposes a gRPC by default But also an HTTP version that sits on top of the gRPC interface And we've actually then gone and code genned SDKs for pretty much every language now from from that using the protobufs So we have Pretty much every language that's commonly used.

We now have an SDK for and if you're using some esoteric language or one that maybe hasn't got the functionality you need just yet You can just fall back to using directly the the rest or the grpc Api as well And but you can find out you can find all these if you just go to the ecosystem page on servers.

dev And it takes you off to all the different sdks that we have produced And again, we've had a few community contributed sdks as well, which we're very thankful for You And always open

for more. Awesome. Well, with

Taylor: that, I'm out of my questions and I'm not seeing it. Haven't seen any more come in from the live stream.

So. I think we're probably at a pretty good stopping place Was there any other calls to action or things you wanted to close with alex before we close out here?

Alex: Yeah, I'll just say please go give it a try. It's a complete open source project service. dev Has all the relevant links and we are More than happy to go and jump in a call with any of you and kind of walk you through getting up and running and help you even write your first set of policies in our in our playground but yeah, please don't reinvent the wheel of authorization would be my my main takeaway It takes too long.

It's not worth the effort and there's already solutions off the shelf that will do it for free as well.

Taylor: Awesome. Well, thank you so much, Alex, for the presentation, walking through everything and showing how this all works. If you're interested, please visit any of the links we've been sharing throughout the stream and we hope to all see you at the next Cloud Native Live.

Thanks, everyone. See ya.

Book a free Policy Workshop to discuss your requirements and get your first policy written by the Cerbos team