Amazic Podcast: Unveiling the Future of Authorization with Cerbos

Published by Alex Olivier on August 30, 2024
Amazic Podcast: Unveiling the Future of Authorization with Cerbos

In a recent episode of the Amazic Podcast, Cerbos’ co-founder and Chief Product Officer, Alex Olivier, sat down with Twain Taylor to discuss the exciting advancements at Cerbos, especially following the recent General Availability launch of Cerbos Hub.

This episode is a must-listen for anyone involved in building, securing, or scaling software systems. Whether you’re a software engineer looking to deepen your understanding of authorization, a CTO exploring scalable security solutions, or a developer interested in the latest tools for managing policies at scale, Alex Olivier’s insights will provide you with valuable knowledge.

The Evolution of Authorization: Beyond Authentication

One of the key topics Alex addressed was the often-overlooked complexity of authorization compared to authentication. While authentication (AuthN) is the process of verifying a user’s identity, authorization (AuthZ) determines what that authenticated user can do within a system. Alex clarified how Cerbos operates at this critical juncture, ensuring that every action a user attempts within an application is subject to fine-grained, context-aware authorization checks.

For developers and engineers, understanding this distinction is crucial, especially in cloud-native environments where security and performance are paramount. Cerbos provides a robust, scalable solution that integrates seamlessly with existing infrastructures, enhancing security without sacrificing speed.

Managing Policies at Scale: The Cerbos Approach

Alex also delved into the best practices for managing authorization policies at scale—a challenge that grows as systems become more complex. With the recent launch of Cerbos Hub, a policy administration point, managing and distributing these policies across distributed systems has never been easier. This tool allows teams to edit, test, and deploy authorization policies in a controlled, coordinated manner, ensuring consistency and reliability across all instances.

For CTOs and CISOs, this means a more secure and compliant system, where policies can be updated and enforced in real-time without disrupting operations. Developers benefit from the simplified workflow, which abstracts the complexity of policy management and allows them to focus on building features rather than worrying about the underlying authorization logic.

Standardization in the Authorization Ecosystem

Another highlight from the discussion was the ongoing efforts towards standardization in the authorization space. Alex spoke about Cerbos' involvement in the OpenID Working Group's AuthZ initiative, which aims to create open standards for authorization interfaces. This initiative is crucial for preventing vendor lock-in and ensuring that organizations have the flexibility to switch or integrate different authorization systems as their needs evolve.

This is a significant development for those in leadership roles within tech companies, as it promises greater interoperability and future-proofing of their security architectures. Being aware of these efforts allows decision-makers to choose solutions that align with emerging standards, ensuring long-term viability and support.

Real-World Applications and Lessons from the Trenches

Alex’s journey as an entrepreneur also provided valuable lessons on product development and customer feedback. Cerbos' deliberate approach to product development—taking a full year from initial announcement to GA—was a testament to their commitment to security and reliability. Alex shared an insightful example of how customer feedback during the beta phase led to significant architectural improvements, ensuring that Cerbos Hub could meet the stringent demands of production environments.

Transcript

Twain: Hello, everyone. Thank you for tuning in to the amazing podcast. It's great to have you with us and we cover all things. Kubernetes cloud native. That's what we enjoy talking about. And I think that's what you enjoy listening as well about and I'm twin Taylor editor at amazing. And if you like what you share on this podcast, go and check out our other stuff.

I think we've been doing this now two plus years. So we've, we've covered so many organizations, open source projects that you really want to catch up on the previous ones as well. So go and check them out at amazic. com. That is A M A Z I C. com.

Today the topic is Authorization. And we have someone who we've had in the, in the past, about a year ago, I'd say we featured this the startup named ServBoss and they were pre launch back then, pre GA at least, and still refining their product and that's when we spoke to them and they, they gave us a sneak peek and yeah, it was quite exciting to, to talk to them back then.

But today. They're really excited because they've just had their big launch just two weeks ago. And so we wanted to get them on the podcast and get them to tell us all about it and what's new, what's, what's been cooking since then. So I have with me today, Alex Olivier, who is the co founder and chief product officer at Cerbos.

Alex, welcome back. It's great to have you here.

Alex: It's very near. Thank you for having me back. Always, always happy to chat with you and it's been an exciting few weeks. So got lots to talk about.

Twain: So, yeah, tell us about the launch. How did you launch and what was special about the launch?

Alex: Yeah, absolutely.

So the very first time we spoke, I think, yeah, it was over a year ago now, probably. We had been working away on our open source project. So Cerbos is a open core company. So the core engine authorization, which is Cerbos is completely free open source. Being out there for nearly four years now, which is kind of scary.

And what we've been working on for the last year or so has actually been a management layer. This is on top of the open source project. So before diving into that, I think it's probably worth framing a bit authorization, what we actually talking about here. Cause unfortunately authorization sounds very important.

Much like another word authentication and it's sometimes very easy to mix those two up. So, yeah, that, that is the problem. Cause it gets shortened to Auth C and Auth N and sometimes even I get muddled. So what we're talking about here is authorization. So when we talk about authentication, Auth N, that is the process of which.

Some system will challenge you to provide a credential username, password, and you get back a verified identity that says, okay, this person is who they say they are because of some challenge they had authorization is the next step after authentication, which is okay. I now know who the person is. What are they allowed to do?

Should they be able to hit this endpoint? Should they be able to edit this resource? Should they be able to go and do X, Y, Z inside of a system? And that's the challenge that authorization and authorization and the challenge that Cerbos has been kind of working in. And our open source project is what's called a policy decision point.

And there's all sorts of formal nomenclature around the space standardized by NIST a good number of years ago now. But when you talk about authorization, there's a few different components involved. You have a policy decision point, a policy information point, a policy enforcement point, and a policy administration point.

So to quickly rattle off what those things are, a policy decision point is a component inside of a system that will get asked. Can this user do this action on this resource and it decides yes or no a policy enforcement point is the heart of your system, which actually takes that yes or no answer and enforces that decision.

So in most cases, this is typically inside of the application or maybe your gateway level. Policy information point is a system that provides the context about the user or the resource they're trying to access to, to, Lead to the decision make making, and then a policy administration point is where you actually go manage and defining and control your policies.

And that's the product we've, we've just launched into GA.

Twain: Oh, okay. So sounds like authentication is simpler because it's probably a one time thing.

Outro: That's probably

Twain: just Maybe even a couple of times, but then authorization seems more complex because you, it comes up at every turn and corner of the cloud native stack.

Would that be correct to say?

Alex: Yeah, exactly that. And, and. I would say maybe up until fairly recently, you're, you're right in saying that authentication is kind of a one off process that you then get a token or a cookie or some sort of identifier, which is valid for some time period. You know, if you're using JSON web tokens, JELTS you'll be familiar with like the expiry timestamp that's inside of those tokens, which tells you how long that token is valid for.

And that can be done at the start of a session or, you know, refresh that half of the session, et cetera. But you've got some sort of key, some sort of credential, and that's kind of valid. I would say there is actually much more of a movement recently around doing more continuous authentication as well, where you actually much more aggressively verifying the credential, but that's, I wouldn't say commonplace right now.

But yes, authentication generally is a, a process that you can do once, then you essentially kind of cache that credential or have a long lived credential that goes around. Authorization, particularly fine grained authorization, which is the space servers plays in, is something that needs to be done at every single API call or request or method action that occurs inside of a system.

Because it's very contextual, not based on just the user's identity, but the specific instance of a resource that say you're trying to access Twain. So your request comes in, attach that request as your identity, your credential. And then based on the API call, you're trying to do a particular action on a particular resource.

So if you just want to speak in like HTTP verbs, you might be doing a post or a patch or a delete that as an action. And you're doing that action on a specific instance of something. So it could be. You know, you're posting an article on amazic. com, you're trying to interact with a, a article object inside of the system and to make an authorization decision, it's going to be contextual based on who you are and the specific instance of a resource that you're trying to do the action upon.

So maybe you have a rule inside of, say, a CMS that only the author of an article can delete an article. So now you need to actually now know who you are. The specific article you're trying to access some attributes about it is the state of this published and who was the owner ID and then make a decision based upon that.

And so authorization has to be done in every request. And more importantly, it's in the blocking path of every request. So performance is key around how you quickly, your system can generate those decisions inside of your application.

Twain: Performance and even security is the ultimate thing that's at stake.

I'm curious to know about one thing, which is just this perspective of humans versus non human does that come into play at all when talking about authorization does it change when you talk authorizing human users versus you know things like components, different components and non human could you talk a bit about that?

Alex: Yes, this is actually a really interesting space that I think is gaining a lot more focus right now. It's always been a concept that's been, you know, in the, in the background, but I think recently there's been a lot more terms of the things we're being asked by our users around how to best support non human identities, and there's even a whole companies spinning up to do focus just on that particular area.

And at the actual decision level, when your system needs to decide, can this request with this identity, this action. It's actually exactly the same because, you know, a lot of the, the formal literature talks about a subject or a principle, you know, that doesn't necessarily dictate that this is a human.

It's just an identity that identity has an ID. It's got some attributes associated with it, and you can make decisions the exact same way, whether it's a human or not. I'd say that the bit that is different when it is a non human entity is those identities or those, those kind of, you know, Credentials are more typically longer lived because you can't necessarily, well, you can't get like a microservice to go through an authentication flow.

You generally kind of issue that service and identity and give it some credential, which is valid for typically a longer time period. And you can't just go and reauthenticate them because Yeah. A service can't go and necessarily log into a UI and enter username and password. So obviously it's a very different kind of credential, but at the core of it, when you're trying to make an authorization decision, whether that is a human or a non human identity, trying to do a call, it's kind of irrelevant.

It's just an identity or a subject is trying to do some system. With Cerbos we have. A case study with a company called utility warehouse here in the UK. They're one of the largest utility providers, part telecom group. There are, you know, multi billion dollar publicly listed business here. They have four and a half thousand microservices and they're using serverless to authorize requests between them.

So doing service to service authorization and also sort of delegated access. So the end identity is actually passed down the chain of requests. And, and to, to serve us and to really, to any policy decision point. If that density is human or not, it's kind of irrelevant. It's just, here's an identity, here are their attributes.

They're trying to do this action should be allowed or not. So

Twain: it's just a lot more bigger scale when it's, when you're talking about non human identities.

Alex: Yeah. And there's some great kind of other open source projects out there. If you look at how to issue identities to services, so Spiffy and Spire the kind of Google white papers I came out around how they do identity inside of Google has spurred a number of kind of businesses off the back of it.

And we kind of see an intersection with servals with those around consuming those identities in whatever form they are. Yeah. Typically, it's just a job token.

Twain: Yeah, we have a lot to talk about, especially with what's been happening recently, both with Cerbos and in the industry. I want to talk about starting with just Cerbos itself, so that there's a bunch of new features. One of the interesting ones that I saw, you had written a blog post on the Dagger module.

What is this new feature? What does it do and why is it important?

Alex: Sure. Yeah. Yeah. Cerbos, as I mentioned, the start at core is open source project, and we've been really kind of embracing different workflows and different ways developers kind of interact and build their policies and manage their pipelines and those sorts of things.

So the core of Cerbos really is a GitOps style workflow where you have your policies as a static asset in a. GitHub repo, let's say you work in those policies, which define your principle define your resources, your actions, and under which condition those actions should be allowed, you can be a simple role based check, or you can do a more fine grained attribute based check and fitting with the GitOps principles as well.

We also allow you to write tests against those policies. So as well as defining. Here's my resource policy, here are the actions and here's when things should be allowed, you can then also write your test cases and write your test suites. So under you, you include in your, your policy set, a set of example principles or subjects, example resources, and then define what the expected actions are allowed.

And we have CI tooling and such to do it. Now Dagger if anyone's familiar with it, is another great kind of open source project and company behind it as well. Which is a way of kind of orchestrating and defining those very rich pipelines for doing your CI processes. So rather than writing those big sort of GitHub action type CI pipelines, which get bigger and bigger and bigger Dagger allows you to actually do it through code.

So they have like Go bindings and JavaScript bindings now, I think, as well, and a few others. And the idea is you're kind of putting together a DAG directed at AC Levarth hence Dagger. And there's a kind of a rich ecosystem around it. So we worked with them to kind of build out a serverless module.

So what this then means is if your pipelines are using Dagger for defining, The, the graph of CI jobs to run, you can drop in serverless in there and you can run your serverless tests as a Dagger module step. But also we provide a mechanism where you can actually then spin up a Dagger sorry, a serverless instance within your Dagger pipeline which is.

Answers kind of one of the questions we get a lot, which is how do I do sort of end to end testing when serverless is involved as well. And so what this allows you to do is actually spin up a serverless instance for the lifetime of your, your pipeline when it's executing and actually run requests through your application and actually hit a real serverless instance behind the scenes to kind of do that end to end.

So the serverless tooling allows you to do. The very simple, very simple, but the, the the actual tests of your policies. And then you can also use what we call the service run module to then actually run a real surplus instance and actually do end to end sort of integration style testing of your application using a real services behind the scene.

And the dagger module is published in the marketplace the marketplace there, the dagger first, I believe dagger first I guess marketplace is the word, but sort of showcase and then you can pull that in and pull it into your Dagger pipelines. It makes it much easier to work with Servos inside of that environment.

Twain: All right. Wow. So just easier management for policies at scale. That sounds interesting. Is there any other new feature or addition to Servos itself in the past few months that you'd like to mention that you'd like to talk about?

Alex: Yeah, so in the core service open source project, before we come on to server sub we've been adding a few new capabilities that have really come up through working with our users.

The first one is actually around what we call policy outputs. So typical policy decision points you ask it the question, can this subject do this action on this resource? It evaluates your policies, comes up with a decision and gives you back an allow or deny. You get a boolean and that's kind of it.

Now there's a whole class of authorization checks where actually just getting an allow or deny isn't really enough you need a bit more context, in again sort of the classical case, classical architectures for these kind of systems This is sometimes referred to as advice or obligations And it's a mechanism from which your decision point can actually return not just an allow or deny But it can return some other metadata or other output to the application layer or the enforcement point for it to make some decision And What policy outputs allow you to do is when you're writing your service policy, as well as defining the rules under which an allower denies Should be returned you can also then define an output alongside it And this is kind of free form based on you as a policy author to define but one of the first kind of use cases that one of our users came to us with and We ultimately ended up building policy outputs around is they wanted to be able to Tell the application layer when an action is denied But it would be allowed in the case that someone does a step up authentication And the actual use case here was it was like a fintech type platform You And they wanted to put limits in place where only transactions over a certain amount are allowed if someone has done a 2FA check within the last 10 minutes So the service policy says if transaction amount is over ten thousand dollars Let's say and the last two factor authentication step up happens within in the last 10 minutes, then the action should be allowed in the deny case.

We have a policy output that replies deny reason being 2FA required. So it will say, okay, it passed the threshold test but it's failed the 2FA check. So what comes back to the application is a deny and then a payload, which says encoded as like an error message that a 2FA is required. And then the enforcement point, so the application layer at this point, knows how to interpret that response and redirect the user off to the authentication system and trigger that step up authentication.

So do the 2FA, you know, one time password, those kind of things. And what then comes back is the request is going to be repeated. Service will now allow it because the attribute that says when the last 2FA occurred has been updated and now this policy would allow that request to go through and the application then proceeds at that point.

So there's a mechanism via still writing policies, you're still doing version control testable policies to return more than just an allow or deny, but return an output or some advice or an obligation back to the application layer or the cooling system to then perform further actions or driven by your actual policies.

Twain: Interesting. It sounds like it's a good complement to the previous thing we just spoke about. It seems like the Dagger module is more end to end top level management. But this one, Policy Output, seems to get into the weeds and give you more detail about each policy. And yeah, it just gives you, like, a closer view of what's happening.

Alex: Exactly.

Twain: Yeah. You're going to say something.

Alex: Yeah. The, the kind of those use cases really kind of various. I mentioned the kind of the, the, the to a fake example. The other 1 we've seen in the wild as well is using it to do IP range restriction. So the only thing you must be on the corporate VPN range for actions to be allowed.

You can return back an error message saying it was denied because IP restraints restrictions. The other one we've seen with like a call center help desk type system is only allowing certain actions within office hours based on that person's profile. So again, using that context and then returning a message why rather than just a flat out deny.

And then this flows into the audit logs as well, which is something that's baked into the core of serverless. So every decision that a decision point makes. It gets logged, this principal will try to do this action on this resource. And it was allowed or denied by this particular policy and produced this particular output.

So it's all a kind of a holistic system in terms of policy authoring, testing, making decisions, getting outputs and then capturing those outputs as well inside of the audit logs and gives you that full visibility of exactly what's going on inside of your system.

Twain: There was another interesting blog post that I saw on your website talking about sidecars.

It's titled what's so bad about sidecars quite a provocative title. And considering we're coming from times when sidecars were lauded as the best thing because they. Yes. Okay. Load up the containers themselves and they gave you a separate management plan with things like STO and all of that. So now we're talking about, you know, our sidecars really that good.

There are some drawbacks to them. Sounds like that. Would you tell us a bit more about that blog post and tell us about sidecars.

Alex: Yeah. So Cycars, we're talking here in the Kubernetes context, when you have your pods and inside that pod, you have your main sort of service container, your application container, and then optionally you can have N number of additional containers that run alongside it.

And Kubernetes guarantees are all running on the same node and they're all talking on the same sort of interface, et cetera. Now, as you mentioned, like Istio, there's been kind of a shift in how the service meshes approach where before I used to inject Cycars into every instance, the downside with that is now every service inside of your cluster needed more compute resource because every instance of each pod now had more compute going on inside of it, and thus the load on your overall cluster is going to increase because you just have more containers running and they did all the high end Cycar injection stuff.

Moves being kind of moved away a bit from, from that and Cycars. Mainly for that kind of overhead and sort of management complexity have kind of been pushed aside somewhat and, you know, these ambient mesh type approaches and stuff like that around now with authorization sidecars are actually a perfect use case.

We believe anyway, because going back to my other point around speed and security really is key because authorization checks are in the blocky part of every single request. The way serverless is architected is serverless decision points. The component you run inside of your system that you ask, can this user do this action, this resource, are fully stateless.

The only thing that's pulled into those are the actual policies when the, when they start up and then cut off the date, but there's no other state involved in terms of, you don't have to go to database, you have to go to disk, you don't have to go to network or anything like that. All the decisions are being done in memory and being processed based on the request that comes in from the enforcement point to the decision point.

Along with any context etc. So what you really need to optimize for because this is in the blocking path is reducing as much as possible the network from your service to the decision point and Sidecars are a perfect use case for this because kubernetes then is running those in the same pod they can talk You know over localhost you can even use a unix socket directly using rsdks to talk to the service instance and it's all about Removing as much overhead as possible between your application and your decision point to come up with a decision if you even look at how we built servers itself, the primary interface that exposes is grpc So much faster than regular old hdbrest style approaches talking through our sockets skips a couple more layers as well And we always kind of recommend anyone that is using kubernetes through application deployment to use serverless in a cycle model as well, because that way you don't have to worry about scaling servers.

Because as your application scales, because it's psycho, it will scale with your application and each instance of your app will just talk over local host inside the same pod guarantees is not hopping to another node. Or, you know, what you never ever, ever want to do is have to go like over the internet and authorization check.

That's just a terrible design architecture and it's going to add immediate latency to every single API call. With the serverless approach, serverless comes up with the decisions, sub milliseconds, anything else is basically network. So using sidecars, you can ensure that your application and your decision point are co located and you're going to get the best performance possible.

When it comes to doing an authorization check and making sure that that request goes through in a timely and responsive manner. And you just don't really have to think about where am I running? My PDPs is just always going to be co located. So. I, I will always fight the good fight for sidecars X, I think is a great pattern for this specific type of use cases used to type of use case.

I think the service mentioned that that kind of ecosystem, maybe it wasn't the best pattern, hence things that moved away. But for these kinds of things it really does fit the mold very well in terms of how you want to deploy something. So key is an authorization decision point inside of your architecture.

Twain: All right. Okay. So kind of, and it depends kind of on the response that I hear. That's interesting. I want to take a step back and talk a bit about the ecosystem of authorization with so many open source projects, so many vendors taking different approaches to authorization. The question of standardization comes up and you know, I wanted to ask you about, you know, what are some of the efforts in that direction to standardize things so that, you know, we eventually avoid Lock him because that's something nobody wants.

So, so talk a bit about that, you know, what's some of the efforts being made, you know, vendors working together, standardization being set up.

Alex: Yes. This is a really exciting time when it comes to kind of the authorization space. So boss is part of a open ID working group called auth Zen which is an effort to do exactly what you were saying.

So standardizing the interfaces and which. Authorization components talk to each other and this was kicked off about a year ago now. And when we look at kind of the broader vision of kind of where authorization needs to be. So the authentication piece today is kind of solved. We go and look at OpenID Connect over to those kind of things, you know, those are standardized adopted systems and you can go to any sort of platform of, you know, Decent scale and go and plug in your own identity provider.

Take like Salesforce, for example, you can go and do federated SSO type things. You as a business, keep your identities inside of your own system and you federate as an entities out to set system. What that isn't today is a mechanism to do essentially federated or single sign on esque type workflows, but for authorization.

So if you go and look at a Salesforce kind of, you know, one of these, these types of systems. You can go and federate your identities and have single sign on where your authentication is kind of brokered and using open ID connect and these kinds of mechanisms. Your authentication is basically externalized from the, The system you're using but you then have to go into an sap salesforce etc to go and configure Okay, now you've got an identity a federal identity in how do you then go and assign those groups and roles and permissions?

You have to go and do that inside of the actual system you know, there's, there's the SCIM and various other mechanisms for doing directory sync, but the actual assignment of application 11 permissions and access controls around those has to be done inside of each application. So when you look more broadly, when, if you want to get to a world where you could go to, I keep using Salesforce as an example, cause everyone knows it.

If you go to a Salesforce and you're provisioning setup, you would go and connect your authentication system And we also want to allow you to basically go and connect your own authorization layer as well And the only way we're going to get adoption of these large big companies is to actually Making sure there's an open standard for what that interface looks like And that is the effort of the authzen working group.

So it's been over about a year now. We're at a Draft implemented stage. So we have a first spec out there. You can go find it on the open IDE website and servos As well as 11 other vendors are some of the first implementing vendors for this specification. And what that spec has done is formally defined the interface from which a policy enforcement point and a policy decision point talk to each other.

So we've defined the interface where an application layer or an enforcement layer. Talks to a decision point and we've thought we've defined the spec that defines the principal subject action resource schema especially a Adjacent payload type schema and how that then talks to the policy decision point and then also define the response format back to it So back at identiverse, which is one of the big kind of industry conferences around this and also eic in berlin the following week Myself and then our friends in other Vendors in the space.

We all kind of came together and demonstrated through a single demo application a to do a simple to do list type app. And with that, we could then actually connect it to any of the 12 implementers that were out there. And. Mid session, we could actually change which decision point the application was using on the back end for each one of these different vendors.

And the application kept working and that's because we've all kind of implemented this first working draft of the spec. So serverless is part of that. We we are kind of working with everyone else in this space, around Defining that specification there's work going on now you can find all the recordings and the notes on the open id website as well look for the auths and working group Around defining the next thing.

So we're looking at how to do batching and box carrying, and then defining sort of the advice and obligations I was talking about earlier. These are all things that are very much in scope and coming up. And, you know, we as servers are very proud to be taking a part in that and working with our peers across various different vendors and If you're on the other side, if you're a, a systems builder, or you've got kind of use cases around authorization, I really encourage you to come and join these, this working group in these sessions, because we're really looking for more use cases and real world scenarios of the kind of things that the spec needs to be able to support in order for us to kind of get broader adoption.

You can read more about it on the server's website, but I also encourage you to go and look at OpenID Authors in Working Group pages. It's open ID. So everything's open funny enough. Both the spec, the meetings, the calls all the, all those sorts of things.

Twain: Wow. Really cool. That's, that's amazing how in all of the vendors cooperating to, to find a standard where you can pretty much switch to any vendor at any time.

That, that's a lot of confidence and yeah, just really come a long way in just a cloud native. What are some of the key Open source projects. Obviously, you know, you talked a lot about Cerbos open source itself. Apart from that, what are some of the key open source projects that you think listeners should keep an eye on in terms of authorization?

Alex: Yeah. So there's really kind of a few different camps around authorization and sort of how to approach it. The Cerbos. Yeah, our open source projects really focused around attribute based access control using a policy based approach. I would say the two projects or I guess the two Models that people would have heard about one is opa open policy agent Which if you're using kubernetes, you'll be aware of and the other one is zanzibar style system zanzibar was a white paper published by google Which basically explains how they are doing google drive google docs style permissions where you have You Millions and millions of objects and trying to have these arbitrary access models based on roles, groups, assignments, folders, et cetera.

Or direct assignments, ACLs inside of the model. And they're really kind of the two camps. And there's a number of open source projects that kind of implement the Zanzibar approach and then. As well as the policy based approach, you know, service is one of those open policy agent was actually the foundation of which we started building Cerbos it's a, you know, obviously a great project.

It's adopted by communities, et cetera. But it's a very kind of broad project. And what we did with Cerbos is we took kind of the engine and built a simplified policy layer on top. So with Cerbos policies, you're writing just in plain YAML rather than Rego. And then kind of from a performance perspective as well.

Servers are really targeted at this application level fine grained access control, problem space So we've really kind of scoped down and ended up sort of swapping out the engine underneath So we no longer actually use open. We no longer use open policy agent We've got our own in house built engine still open source.

You can find it on github Which gives us up to a 12 X performance improvement over the other model, because we're really focused and targeted around the application use case. And again, with the policy language itself, put in something that's what we believe is much more human readable and doesn't require developers going to learn something like these other solutions.

So. A number of great kind of projects out there that kind of fall into those, those, those camps, many of which are also part of this open standard working group as well. And so it kind of varies from those and depending on your actual requirements, you may need to go and use something that's more Zanzibar based, but if it's more policy based, then, you know, Cerboson or one of the kind of other OPA based implementations might be more applicable to your use case.

Twain: Oh, interesting. I didn't know that. That's a really good thing to know. Something that someone is really into researching all this stuff should really look into this. There's some really cool background there. So I wanted to ask if you could show us visually this being a video podcast some of the things you're talking about around sidecars and just the new, the new things that have been happening at Cerbos.

Absolutely. Absolutely. Something for us on those lines.

Alex: Sure. So I'll just pull up my screen and because we're kind of talking, talking more about architecturally based, I'm just going to use a couple of diagrams to kind of explain typically how you'd use a policy decision point and then also where serverless hub, which is our, our policy administration point kind of fits alongside it.

So if you look at a typical request workflow, and this is applicable, not just for serverless, but for any sort of Policy decision point system that you might use you have some end user and they are interrupting with your application So they're in a browser or a mobile app or they're hitting your api if they're in non human identity Maybe that request comes in and it hits your environment.

It's your infrastructure You know this being communities based. This is your cluster request comes in probably goes through some sort of gateway type system And then at some point it's going to hit your actual instance of your application So you've got that routing layer and it's going to hit a specific pod inside of your cluster And inside the application at request time, you essentially know two things, you know Who the principal or the user is, and you know what resource they're trying to hit and sort of the action they're trying to do against it.

So in your application layer, you are probably gonna have a token of some sorts from your identity provider. Maybe you go out to a directory type system or go and enrich the token with some extra metadata. So what teams or groups, someone's maybe a member of your application kind of knows. And then your application or your service being responsible for the actual resource they're trying to act, access knows how to go and query its own data store and go and fetch that record.

So to use your, you know, the amazing CMS, for example the service that handles editing blogs knows how to go and grab that from its own database. And now in your application layer, this is where you typically then have that logic that says, okay, if the user is the owner, they can do the action. If they're the manager, they can do the action of under X, Y, Z scenarios, et cetera.

And that logic is typically sort of hard coded. What externalized authorization, which is what this whole kind of space is, is, is called does, is it extracts or externalizes all that logic out into a standalone policy decision point. So in Servos, as we talked about earlier, we recommend this running as a sidecar inside of your, your environment.

So here we're actually talking inside of an individual pod and in the application layer, you package up that information about who the principal is, who the resource is, and what action they're trying to do and send that over to that decision point. So in the sidecar model, that's just a local host call.

That decision point has loaded into it, the policies. So in service world, those policies are those YAML definition files. You store them in some sort of store. We support multiple backends, but the one most users go is they go and just have a GitHub repo that holds their policies, or it's just another folder in their application repo.

And when you, that policy decision point starts up. You connect it to where those policies are stored. So in the case of GitHub, you give it a credential to your GitHub repo. It goes out, pulls down those policies loads them in, and then it's ready to start serving repo that holds your policies. Can also, as we talked about earlier, can have tests alongside of it.

And we have other tooling and sort of a whole policy editing playgrounds as well available. But the core of these kind of policy files are loaded in. Now inside that decision point, when that request comes in, can this principal do this action on this resource? The policy decision point now goes and evaluates those policies.

With serverless it's stateless. So it's not having to go and query database or hit disk or network or anything It's making that decision based on all the context that you're giving it at request time from the enforcement point the policy decision point Evaluates the policies comes up with a decision and allow or deny Optionally also some of those outputs as well creates a log of that decision So you have this permanent log of this user tried to do this action on this resource and it was either allowed or denied by This particular policy and then what comes back to the application layer is that allow or deny with some metadata or outputs, or you can even do a kind of crude criteria response, which we'll talk about in a second, but also what this now means in your application layer, through a directly calling the API or using one of our SDKs, what you get back is that allow or deny.

So in your application code before, where you might've had all this hard coded logic, where it's inspecting the user's roles or groups or attributes, it's now a single if statement. If the decision point says allow do the action if not return some sort of error to the user So what this really also means is because we've externalized all that logic out is these policies over here Can change and update and eventually your policy decision points will get that change through And now the next request that comes to your service will now be getting a decision back based on the latest policies without you having to touch or redeploy the application service over here.

So it really has externalized that out now. So there's actually a secondary endpoint. So we generally the use case of. Can the subject do this action this resource? Yes or no is the kind of typical interface But with surplus we also have what we call a plan resources call Which is for the use case where you want to sort of filter or your database query or show a list of resources someone has access to What you don't want to do is have to go and query every single record from your database and then do a check for each Individual one because you could have You know, hundreds, thousands, hundreds of thousands, millions, billions of records inside your data store, and that's going to be highly inefficient.

So Cerbos also has a second endpoint which we call the plan resources where you can say to your application could say to Cerbos I have this user with these attributes trying to do this particular action on this kind of resource And what Cerbos then does is kind of a partial evaluation where it goes through your policies and works out which conditions it can evaluate Which ones it can't and what then comes back to your application is what we call a query plan And a query plan is it Abstract syntax tree or conditions for you to then apply to your database lookup or your mongo filter or your dynabase lookup or whatever you're using for your data store.

And that query plan will say, okay, this attribute of a resource should be this particular value, or this attribute should be boot true, this particular attribute should be in one of these three values, et cetera. And what service has done is kind of inverted your policies and giving you back a set of filters.

And then if you then apply those filters to your data lookup, the results you're going to get back are just the records of that resource kind that that user would have that particular mission to do on. And thus you are filtering, you're doing data authorization aware data filtering. Push down to the database level all driven by your policies.

So when those policies change again, the next request to Cerbos when it gets a query client back will be a new set of filters. And thus you're going to get again the updated query, essentially the where clause that will return back just the records that the user had that particular permission to.

Alex: that's kind of the typical policy decision point flow in that with Cerbos, everything I just explained is going to be open source, you know, you can see how GitHub up here, you can go and grab it and off you go.

Well, okay. So

Twain: that's, I think a great overview of how it all works. And I think just abstracting and separating the policy management from just the core like to the infrastructure behind, or just the From the system itself makes a big difference. And you can I like that. You can update your policies asynchronously, you know, you don't need to change up how your system just to change your policies and it's just, there's, there's good separation of duties within the system.

And that's, that's really cool.

Alex: Yeah, and that's kind of the core of Cerbos and what we released two weeks ago is Cerbos Hub. And so what Cerbos Hub is, is a administration control plane, a policy administration point is kind of its formal name. If you were to look at the formal specification kind of for this area.

And what serverless hub is, is an administration layer that sits on top of those decision points. So architecturally, you'll notice that this is all exactly the same. So your services running, your instances, how it makes decisions, et cetera, all runs in your environment using the open source core, et cetera.

What has changed is how serverless hub sits on top and is much more a managed workflow for working with your policies. So Servers Hub at its core really does four things. Firstly, it's a, an environment for you to work and edit in your policies. So it has like a whole collaborative IDE, sort of a la Google Docs style, where you and colleagues can work together on policies, gives you real time feedback on those changes and updates of what's going on.

Secondly, it's a CI CD pipeline. So, Service Hub connects to your Git store, and when you go through and update those policies, it gets that commit from GitHub, it kicks off a build pipeline, and inside of Service Hub, we then go and compile, test, and validate your policies. So, it's a, a, that's the CI part of policy updates for you.

Third thing it does is the distribution. So the challenge that you get when particularly when you're running in a sidecar model is you're going to have more than one serverless policy running inside of your environment and sorry policy decision point for any environment and when you push out policy changes you're going to want all those decisions points being updated.

In a timely manner, whenever there's a policy change with the open source projects out there, you kind of have to manage how you push new policies, those instances yourself, typically, it's just like a polling. So you're going to 5 minutes and updates, but that means it's going to take up to 5 minutes for those instances to get those updates.

You end up with kind of inconsistency type problems. So, service hub has built into it a synchronization and coordination system built. As part of the pipeline. So once your policies are built, because all your decision points are now connected to your policy administration point, your Service Hub instance, Service Hub then coordinates the rollout whenever there's a policy change.

So all those instances will get notified at the same time that there's an update to your policies, and they'll all go and pull down the policy update. And that's keeping everything synchronized and up to date across the state of, you know, five, 10, hundreds of service instances inside of your environment.

Service Hub manages the, the, Administration and the rollout of policy updates to them and and then the final thing. So what's up does is actually enables. What we call our embedded policy decision point. So there's a whole class of use cases where you want to just show and hide menu buttons or items or Navigation things inside of your apps based on the user's roles and permissions And what you don't want to do is have to hard code all that logic into the application itself So server's hub produces what we call an embedded policy decision point Which is built from the exact same policies that are running in your back end But this produces an embedded version of our policy engine and behind the scenes it uses web assembly You If you're familiar with that technology, so you can basically run it anywhere.

And that can then be pulled down into like say an edge function or a serverless function or even directly onto the device. So in a browser, for example, and then you get the exact same API interface to say, can this user do this action on this resource? But this time it's all being evaluated locally without any sort of network or having to expose any backend services to, to, you know, Your application and service hub does the work again to keep that coordinated and updated As your policies change on the back end So both your back end policy decision points and your embedded versions are always going to be in sync with whatever your policy store in github holds

Twain: That's really cool.

We're we're almost out of time. So I just have one more question for you alex about uh, just your you wearing your your product you know just as an entrepreneur want to ask about just this, this journey of going from idea to product that you've launched could you share the, what you've learned through this phase and how you maybe an example even of how you collected feedback from your customers during the, the the beta phase and how you implemented that into the final product.

Any surprises along the way that you didn't expect? You talk a bit from the entrepreneur. Product perspective of, of building CBOs and launching it?

Alex: Yeah, absolutely. So because CBOs is a security system at its core, like we were very deliberate and careful with how we approach development. And we, you know, from my previous kind of companies and roles.

So we actually move at a slightly different pace because we need to be so rigorous and sure that serverless is working as it should, and that we're being much more deliberate and you can kind of see that in what has been the release of serverless hub. We actually first announced serverless hub a year ago back at KubeCon North America, a year and a bit ago now that it was available and it was going to be a closed.

Sort of a closed alpha closed phase and we got number of our users on that and we had a very kind of close group, testing the very early and very rough versions Of server's hub and to get that kind of early feedback and we went through that phase for about six months And then we did another six months of sort of an open beta and that's when kind of usage explodes.

We had 500 odd companies come on and actually be some of our beta testers in that phase. And that's where we kind of got a lot more sort of practical feedback very early on. We were kind of working through a lot of the early use cases and trying to understand more the operational side of things.

And then in kind of the more open beta, Again, a very, a much more prolonged than previous kind of typical businesses, because we had to be very deliberate from a security perspective. Cause this is not a system that can break ultimately. And a good example of the feedback we kind of got very early on is people love the idea of this kind of coordinated rollout update system.

Where serverless hub is the system, which notifies all your different instances. Whenever there's a policy change and things need to be updated, because that was really one of the hard things that we were hearing from the open source usage. So we built this great system. Yeah. Yeah. Great. We built the system.

We thought it was great which then provided API and the decision points connect back. And then that is the API is a bidirectional stream provides the updates what's going on. And we've got users using it and they were running instances across their cluster, et cetera. Yeah. And then as we kind of got closer to kind of production ready and user's going to deploy very rightly, one of our users challenged us and said, this is great, but what if your serverless API goes down, you know, what if, whatever instance serverless hub sort of falls over and does that mean my instances won't start anymore?

Does that mean my application is down because serverless is down and we're like, fair play. Yep. That is a very valid concern. And, you know, we obviously built redundancy in our systems in terms of high ability, et cetera. But this was a moment for us to say, okay, right. We're a security company building security based software.

This thing needs to be. You know as bulletproof as we can make it and this was a point of failure which we had to go and design around So we took a decent amount of time to go and kind of rethink and re architect how that distribution works So now things are built in such a way that even if service hub and entire infrastructure is down Your instances will still work and they will pull down the latest versions of policy that was last built by serverless hub.

And that's a complete independent system to our deployment of our serverless hub. So we, you know, since you're using CDN type caching, there's encryption of the actual bundles. So they can be publicly available, but only decrypted by your local instances, et cetera. And that was one of the things that was kind of.

A completely valid and hard requirement for one of our users for them to go to production with service hub, because they need to make sure that they were protected in the case that our systems are down. And that is, that is something that rightly we were called out on. We went and built and designed an architecture and collaboration with them, which I would say they're now kind of live with it.

And it's one of these things that. You know, we had early thoughts off, but it was only really when we were going through this beta sort of phase that we've really got to the nitty gritty of it and decided to pick one of those designs off the shelf and actually go through and implement it based on direct users asked and happy to say that that is now a paying customer and they're using serverless in production because of these kind of failsafe mechanisms in place.

So from my journey as well, we, we took a much longer and deliberate. Deliberate approach to how we did our kind of early at early user and beta testing and took a whole year to go from first launch to GA because we wanted to get this absolutely right. Because of where server sits inside of your application, it has to work and it needs to be stable and it needs to be up all the time that your application's up.

And so we've kind of built around that and that really is our philosophy with every feature we build. We take longer because. We're doing it with the mindset of our end users, where this is a critical component in their entire stack. And that is the hat we wear for every single decision we make from a product perspective.

Twain: Wow, that's a great example. Amazing. It was, it was a delight talking to you, Alex. Thanks so much for joining us today and sharing all of that insight. There was just so much to take away about authorization. And yeah, just kudos to you guys. Just building something. Really amazing. And even explaining it in a way that it's, yeah, it takes interest.

I'm sure of all of the viewers and listeners. So yeah, if you guys launch anything big in the coming months, definitely come back and tell us about it. We'd love to keep track of what's happening with Cerbos. And yeah, it'd be great to have you back soon. Or a part three. All right. And to all of our listeners and viewers, thank you for tuning in and we'll see you on the next one.

Book a free Policy Workshop to discuss your requirements and get your first policy written by the Cerbos team