How to build an authorization system for your RAG applications with LangChain, Chroma DB and Cerbos

Published by Usman Malik on December 19, 2024

This article explains how to implement authorization systems for your Retrieval Augmented Generation apps.

Authorization systems allow controlled access to complete or partial software application features, limiting access to sensitive or specialized features based on user roles and permissions.

Authorization is particularly crucial in Retrieval Augmented Generation (RAG) applications as they involve ingesting data in vector databases. Effective authorization ensures that only authenticated and permitted users can ingest, retrieve, or manipulate data in vector databases.

In this article, you will explore the following concepts:

A hands-on example of RAG applications and how to develop them in Python using the LangChain framework and Chroma DB.
Various security concerns for RAG architecture.
Overview of various authorization techniques.
Importance of authorization for RAG applications.
Implementing RAG authorization system using Cerbos, an open-source authorization layer.

Before we begin exploring authorization approaches in RAG, let me introduce myself. I have been working with NLP techniques for the last 10 years, focusing on RAG since 2022. I have developed industry-grade RAG-based chatbots, virtual assistants, and language generation applications. In this article, you will learn RAG authorization approaches from someone with first-hand industry experience.

So, let's begin without further ado.

What is RAG (Retrieval-Augmented Generation)?

Retrieval Augmented Generation is an approach that enhances the capabilities of LLMs by augmenting their default knowledge using external sources.

How does RAG work?

A RAG system typically consists of the following components:

Document loader that imports data from various sources such as PDF documents, websites, databases, etc. You can write custom data loaders or can use data loaders from frameworks such as LangChain.
Text Splitter that splits the documents imported via document loader into smaller chunks.
Embedding Model that converts text chunks into a vector representation. You can proprietary embeddings models such as OpenAI Embeddings or open-source embedding models from Hugging Face.
Vector Store that stores the vectors generated via embedding models. Some commonly used vector stores are Pinecone, Chroma, Faiss, etc.
Retriever that matches user queries with the vectors in the vector store and retrieves the vectors with the highest similarity. LangChain provides different types of retrievers you can use to meet your requirements.
Language Model that takes in the user query and the textual representation of the vectors in the vector to generate the final response. Some of the most famous language models are OpenAI GPT-4o, Claude 3.5 sonnet, and Qwen 2.5-72.

The following figure shows a typical RAG system.

a typical RAG system.png

A RAG application stores vector embeddings of text chunks generated using embedding models in a vector store. This process is called data ingestion. Subsequently, RAG converts new user inputs into vectors using the same embedding model as the one used for generating vectors for the vector store.

RAG then compares query vectors with the vectors in the vector and retrieves the vectors with the highest semantic similarity. The query and the text for the retrieved vectors are passed to the LLM to generate the final response. To make this concept more tangible, let's build a demo together.

RAG demo with Python LangChain and Chroma DB

In this section, you will learn how to develop a simple RAG system using the Python LangChain framework and the Chroma DB vector database.

Importing and installing

You need to install the following Python libraries to run the following scripts:

!pip install -U langchain langchain-openai pypdf chromadb langchain_community

Subsequently, run the following script to import the required modules, classes, and functions into your Python application.

import os
from dotenv import load_dotenv

from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import PyPDFLoader

from langchain_community.vectorstores import Chroma  
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_core.documents import Document
from langchain_core.messages import HumanMessage, AIMessage

load_dotenv()

Creating an OpenAI client instance

We will use the OpenAI GPT-4o LLM in our RAG application. The following script creates a ChatOpenAI object as the LLM for our RAG application. You need OpenAI API client to use OpenAI LLM's in LangChain.

openai_key = os.environ.get('OPENAI_API_KEY')

llm = ChatOpenAI(
    openai_api_key = openai_key ,
    model = 'gpt-4',
    temperature = 0.7
)

Data ingestion into Chroma DB

We will create two Chroma DB vector store objects. The first vector store will store data for Lung Cancer Awareness report, while the second vector store will store data from Meta and Amazon earnings report of Q3 2024.

We will ask our LLM questions related to these data sources. Since GPT-4o's knowledge cutoff date is October 2023, it cannot, by default, answer the questions related to Meta and Amazon's earnings report for 2024. However, with RAG, you will see that GPT-4o will respond to our queries related to these reports.

The following script imports data from these data sources using the PyPDFLoader class and loads and splits them into multiple chunks using the load_and_split() method.

data_url = "https://www.hse.ie/eng/services/list/5/cancer/pubs/reports/national-survey-on-lung-cancer-awareness-report-january-2020.pdf"
loader = PyPDFLoader(data_url)
lung_cancer_docs = loader.load_and_split()

data_url = "https://s21.q4cdn.com/399680738/files/doc_financials/2024/q3/META-Q3-2024-Earnings-Call-Transcript.pdf"
loader = PyPDFLoader(data_url)
meta_docs = loader.load_and_split()

data_url = "https://s2.q4cdn.com/299287126/files/doc_financials/2024/q3/AMZN-Q3-2024-Earnings-Release.pdf"
loader = PyPDFLoader(data_url)
amazon_docs = loader.load_and_split()

You can add information about a document using its metadata. In the following script, we add source and month information for all the documents.

def add_metadata(docs, source, month):
    for doc in docs:
        doc.metadata["source"] = source
        doc.metadata["month"] = month
    return docs

lung_cancer_docs = add_metadata(lung_cancer_docs, "lung_cancer_doc", "June")
meta_docs = add_metadata(meta_docs, "meta_doc", "October")
amazon_docs = add_metadata(amazon_docs, "amazon_doc", "November")

Metadata can help organize documents into different categories, which, as you will see in a later section, allows document filtering based on metadata.

Next, we will create two Chroma DB vector store objects. The first store will contain vectors related to the lung cancer awareness report, while the second store will contain vectors for the Meta and Amazon earnings reports.

embeddings = OpenAIEmbeddings(openai_api_key = openai_key)

lung_cancer_vectorstore= Chroma.from_documents(
    documents=lung_cancer_docs,
    embedding=embeddings,
    collection_name="lung_cancer_collection"
)

earning_calls_vectorstore = Chroma.from_documents(
    documents= meta_docs + amazon_docs,
    embedding=embeddings,
    collection_name="earning_calls_collection"
)

Generating RAG model response

To generate a response from an LLM, we will first define a prompt that tells the LLM to only return responses based on the provided context. The context will be retrieved using a LangChain stuff document chain that stuffs documents retrieved from a vector store into a Prompt.

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

Question: {input}

Context: {context}
"""
)

document_chain = create_stuff_documents_chain(llm, prompt)

Next, we will create a retrieval chain that uses the vector store retriever and the stuff document chain to generate a final model response.

In the script below, we create a retrieval chain that uses the lung_cancer_retriever to generate responses using the lung cancer awareness report.

lung_cancer_retriever = lung_cancer_vectorstore.as_retriever()
lung_cancer_retrieval_chain = create_retrieval_chain(lung_cancer_retriever, document_chain)

Let's ask a few questions about the retrieval chain we created.

query = "What is the revenue generated by Meta in Q3 2024?"
response = lung_cancer_retrieval_chain.invoke({"input": query})
print(response["answer"])

Output

The context does not provide information on the revenue generated by Meta in Q3 2024.

Since we are using the lung cancer retriever, the retrieval chain did not answer the question related to Meta's revenue.

Let's ask a different question, this time related to lung cancer.

query = "What are the major causes of Lung Cancer?"
response = lung_cancer_retrieval_chain.invoke({"input": query})
print(response["answer"])

Output

The major causes of lung cancer include smoking, working environment, hereditary or genetic factors, air pollution, toxic chemicals, asbestos, second hand smoke, and alcohol. Other risk factors include environmental factors, poor diet, inhaling dust, lifestyle choices, previous cancer or other illnesses, obesity, unspecified pollution, vaping, lack of exercise, drugs, stress, and radon gas.

This time, you can see that the retrieval chain provided a response using the lung cancer awareness report.

Let's create a retrieval chain using the earnings call retriever and ask a question about Meta's revenue.

earning_calls_retriever = earning_calls_vectorstore.as_retriever()
earning_calls_retrieval_chain = create_retrieval_chain(earning_calls_retriever, document_chain)

query = "What is the revenue generated by Meta in Q3 2024?"
response = earning_calls_retrieval_chain.invoke({"input": query})
print(response["answer"])

Output

The revenue generated by Meta in Q3 2024 was $40.6 billion.

Since the earnings call retriever also contains information about Amazon, we can ask questions related to Amazon's revue.

query = "What is the revenue generated by Amazon?"
response = earning_calls_retrieval_chain.invoke({"input": query})
print(response["answer"])

Output

The revenue generated by Amazon in the third quarter of 2024 was $158.9 billion.```

Finally, you can create a retriever using a subset of documents in the vector store. For example, the following script creates a retriever object using only the documents where the metadata attribute source is meta_doc.

If you ask questions about Amazon's revenue using this retriever, the model will not respond.

earning_calls_retriever = earning_calls_vectorstore.as_retriever(search_kwargs={"filter": {"source": "meta_doc"}})
earning_calls_retrieval_chain = create_retrieval_chain(earning_calls_retriever, document_chain)

query = "What is the revenue generated by Amazon in Q3 2024?"
response = earning_calls_retrieval_chain.invoke({"input": query})
print(response["answer"])

Output

The context does not provide information on the revenue generated by Amazon in Q3 2024.

With RAG, the possibilities are virtually endless. You can create all types of retrievers, add filters to the documents, and create retrieval chains suited to your needs. It's really powerful!

Advantages of RAG

I personally prefer to work with RAG since it helps avoid fine-tuning an LLM, which can be costly and time-consuming. With RAG, you directly provide the answer to user queries to an LLM, whose job is then to formulate the answer and return the formulated response.

Following are some advantages of RAG applications.

They allow you to retrieve updated information by enabling LLMs to access current data beyond their training cutoff.
RAG allows you to specialize an LLM in a specific domain. With high-quality and quantity of data, LLMs can generate responses as good as human experts.
Context information retrieved from vector stores reduced the l-likelihood of models hallucinating.
By default, LLMs do not provide sources for their responses. With RAG, you can retrieve the document an LLM uses to generate responses, leading to high model transparency.

However, RAG does come with certain limitations. Below are the challenges I encountered while working on my RAG projects.

Limitations of RAG

Following are some of the main limitations of RAG:

RAG depends highly on the quality of data. Poor quality data leads to poor and sometimes inaccurate LLM responses.
RAG systems are often more challenging to implement and maintain than standalone LLMs since you must maintain various components such as retrievers, vector stores, document embeddings, etc. Also, the computational complexity of RAG systems is much higher than that of standalone LLMs.
RAG systems, by default, do not contain any authentication and authorization logic, which, if not handled, may lead to the leaking of sensitive data to unauthorized users.

Now that you know what and how RAG works, let's look at some of the security concerns for RAG applications.

Security concerns for RAG architecture

While the RAG approach is compelling and innovative, it also introduced several security vulnerabilities that need to be carefully addressed before releasing RAG applications into production.

Here are some of the security concerns for RAG applications:

security concerns for RAG applications.png

Prompt injection

Prompt injection is a technique in which a hacker crafts a malicious prompt that prompts an LLM to return sensitive or unauthorized information.

For example, a RAG system may retrieve confidential financial data based on user queries. An attacker could submit a prompt like, "Provide all confidential data on financial projections," potentially forcing the system to retrieve sensitive data even without authorization.

Prompt injections are particularly dangerous for RAG systems. They can allow a malicious user to access sensitive information and cause a model to behave in an undesired manner.

Some common measures to prevent prompt injections involve implementing strict input validation and sanitization, using role-based access control to limit the scope of user queries, and employing prompt engineering techniques to make the system more resilient.

Context injection

Context injection refers to an attacker inserting harmful data into a system's retrieval process, potentially causing RAG systems to produce responses based on corrupted information.

For example, an attacker may add a document containing malicious code that may be executed during retrieval.

Context injection results in two significant problems: misinformation propagation, where the model provides incorrect or misleading information due to the injected data, and malicious code execution, where unsanitized code runs within the system.

Preventive measures for context injection include strict validation and sanitization of content before ingestion and allowing only authorized users to add or modify context data.

Data poisoning

Data poisoning occurs when attackers intentionally alter the knowledge base, compromising the reliability and quality of RAG system responses. One prominent example of data poisoning in NLP is the Tay, a chatbot introduced by Microsoft in 2016 that micks the speech patterns of a 19-year-old girl.

Malicious users bombarded Tay with offensive, abusive, and racial topics, poisoning its learning process. Consequently, Tay began replicating racist and explicit messages, highlighting the vulnerability of AI systems to data-poisoning attacks.

For example, an attacker may insert fabricated documents with biased information into the knowledge base, causing the model to reflect this bias in responses. Data poisoning can involve corrupting model training data or manipulating data vector representation.

Mitigating these risks involves implementing robust data validation and cleaning processes and using anomaly detection systems to identify suspicious data patterns.

Sensitive data exfiltration

Sensitive data exfiltration involves leaking sensitive data to malicious users. Poor access controls may allow attackers to query sensitive data directly, while inference attacks could enable attackers to deduce confidential information indirectly through carefully structured queries.

For instance, in a healthcare RAG system, a user could issue indirect requests to retrieve a patient's private information.

Effective prevention strategies include implementing fine-grained access control, data masking, and differential privacy techniques to obscure sensitive values and maintaining continuous monitoring and audit logs to detect potential exfiltration attempts.

Model inversion attacks

Model inversion attacks involve reverse-engineering of sensitive data through model responses.

Attackers might repeatedly query the system to extract confidential information, such as customer data, by probing the model's responses. This risk is especially prevalent when personal data is embedded within the training set, leading to unintended privacy leaks.

Mitigation techniques for model inversion include using federated learning to limit raw training data exposure, applying data anonymization to remove identifying details, and regularly updating the model to reduce the effectiveness of inversion techniques.

The aforementioned security concerns highlight the importance of access control for AI applications, particularly for RAG.

Access control for RAG applications

Access control is crucial for RAG applications, especially when dealing with private or sensitive data. An effective access control mechanism ensures that only authorized and authenticated users can access specific RAG application resources.

Authentication and authorization in AI apps

Let's first discuss what authentication and authorization mean for AI applications (AI bots, AI companions and agents).

What is authentication?

Authentication refers to verifying a user's identity to ensure that the person or system attempting to access the application is who they claim to be.

Some of the standard authentication techniques for user identification in AI applications include:

what is authentication.png

Password authentication, where a user's identity is verified via a password or user name. It is the most basic form of authentication but is risky if not robustly implemented.

Mult-factory authentication, which combines two or more authentication factors (e.g., password and biometric, text message and email, etc.).

OAuth/OpenID connect. This approach uses third-party services for user authentication, such as Google OAuth and OAuth 2.0.

What is authorization?

Authorization is different from authentication in that authorization determines the actions that an authenticated user can perform. For RAG applications, authorization involves controlling access to features such as data ingestion, retrieval, and vector store manipulation.

Standard authorization approaches for AI and RAG applications include 3 key concepts:

what is authorization.png

Defining roles and permissions: Clearly specify user roles, e.g. admin, data-ingestor, data-viewer, data-retriever, and the associated resources they can access.

Implement fine-grained access control: Implement policies restricting access at a granular level. For example, a finance-retriever role can access only finance-related vectors from the vector store; a health-retriever role may access health-related documents from a vector database.

Use third-party authorization tools: Use third-party authorization tools such as Cerbos that offer out of the box authorization functionalities.

The upcoming section discusses some common authorization designs for AI applications, which are also applicable to RAG applications.

Authorization designs for RAG

Depending on the complexity and requirements of your RAG system, you can adopt one or more of the following authorization designs.

Access Control Lists (ACL)

ACLs are one of the simplest application authorization designs. In ACLs, you define a list of users who can access one or multiple resources, independent of the users' role. Anyone on the list can access the specified resources. An example of an ACL in RAG can be a list of users who can ingest data into a health data store.

Role-based Access Control (RBAC)

Role-based access control assigns permissions to roles rather than users. Users with the assigned roles can access resources. Users and roles can have a many-to-many relation, where a role can be assigned to multiple users, and a user can have one or multiple roles.

For example, all users with the data-ingestor role can ingest data into a vector database.

Attribute-based Access Control (ABAC)

ABAC allows access to resources based on user and resource attributes. For example you can use ABAC to allow users with department=finance to access resources where vector-store = finance.

This fine-grained access control ensures that only the right people can access the right resources under the right conditions, making ABAC particularly suitable for complex systems like RAG.

Relationship-based Access Control (ReBAC)

ReBAC allows access to resources based on the relationship between entities (users, resources, roles).

For example, a team lead can access all documents created by their team members but not those from other teams. In such a case, ReBAC will check who created the document, and if the user who created the document is part of a team leader's team, access will be granted to the team leader.

The ReBac approach benefits collaborative RAG applications with dynamic data ownership and relationships.

Tools like Cerbos can help you seamlessly implement the access control approaches in your RAG applications. The following section will discuss why authorization is critical for RAG applications.

Why RAG authorization is critical

As discussed earlier, RAG systems deal with sensitive and private data ingestion, retrieval, and manipulation. Without a robust authorization system, RAG applications become vulnerable to unauthorized access, data breaches, and unintended misuse. Following are some factors that highlight the importance of authorization in RAG application.

Risk of unauthorized data access

Unauthorized access to private and sensitive data in RAG applications may leak to sensitive data leakage, which a malicious user may exploit.

For example, unauthorized access to a company's private earnings data may help competitors develop strategies that can result in financial loss for the company. Implementing ACL or RBAC can help avoid unauthorized data access.

Ensuring regulatory compliance

Many industries, such as finance and health care, are governed by strict regulations, such as GDPR and HIPAA. Without secure authorization, RAG applications risk non-compliance with these regulations.

Maintaining data integrity

Data integrity is essential to ensuring correct responses in RAG applications. Unauthorized access to vector databases in RAG applications allows malicious users to inject factually wrong or biased information into the database. This results in incorrect and often biased responses from RAG applications.

Ensuring users trust

RAG applications with robust authorization foster user trust. Users are more likely to trust applications that demonstrate that user data is handled securely and robustly. For example, a collaborative RAG system for academic research that restricts data ingestion and retrieval based on user roles, e.g., students and faculty, fosters trust among researchers who know that data is ingested by faculty members rather than students.

Auditing and accountability

The authorization mechanism in RAG applications allows for better tracking and logging of user actions. This is critical for identifying potential data breaches and maintaining accountability.

With authorization, you will have a record of all the actions performed by various users, which will help you identify malicious users in case of data breaches and unauthorized access.

Fortunately, all of the aforementioned issues can be handled by using Cerbos, a robust authorization layer that you can use to implement access control in your RAG authorization.

Implementing authorization in RAG application using Cerbos

Cerbos is an open-source, language-agnostic authorization layer that provides a powerful solution for implementing authorization in modern, distributed applications. It offers improved security, scalability, and ease of management for access control policies.

Cerbos implements authorization policies using a declarative language, decoupling the authorization logic from your application. It is highly scalable and efficiently handles high-volume authorization requests.

In this section, you will see how to use Cerbos to implement various authorization designs such as RBAC and ABAC on the RAG applications we developed in the first section.

Setting up the environment

To use Cerbos authorization, you need to install and run the Cerbos server. You can run Cerbos server via Docker as explained in the official documentation.

You will also need to install the Cerbos Python SDK to call the Cerbos server from a Python application.

pip install cerbos

Next, import the following libraries into your Python application.

from cerbos.sdk.grpc.client import CerbosClient
from cerbos.engine.v1 import engine_pb2
from google.protobuf.struct_pb2 import Value

Note: You must also import the Python libraries from the examples in section 1 in the article.

The following script creates a Cerbos client.

The script also creates the OpenAI API client that we will use to generate vector embeddings and to call OpenAI LLMs in our RAG application.

Finally, we create OpenAIEmbeddings object that we will use to create vector embeddings to store in vector stores.

load_dotenv()

# Cerbos Client Initialization
cerbos_client = CerbosClient("localhost:3593", tls_verify=False)

# OpenAI Client Initialization
openai_key = os.environ.get('OPENAI_API_KEY')
llm = ChatOpenAI(
    openai_api_key=openai_key,
    model="gpt-4",
    temperature=0.7
)

# Initialize OpenAI embeddings
embeddings = OpenAIEmbeddings(openai_api_key=openai_key)

Data loading and metadata addition

We will create two vector stores: one for the lung cancer survey document and the other for Meta and Amazon earning calls.

data_url = "https://www.hse.ie/eng/services/list/5/cancer/pubs/reports/national-survey-on-lung-cancer-awareness-report-january-2020.pdf"
loader = PyPDFLoader(data_url)
lung_cancer_docs = loader.load_and_split()

data_url = "https://s21.q4cdn.com/399680738/files/doc_financials/2024/q3/META-Q3-2024-Earnings-Call-Transcript.pdf"
loader = PyPDFLoader(data_url)
meta_docs = loader.load_and_split()

data_url = "https://s2.q4cdn.com/299287126/files/doc_financials/2024/q3/AMZN-Q3-2024-Earnings-Release.pdf"
loader = PyPDFLoader(data_url)
amazon_docs = loader.load_and_split()

def add_metadata(docs, source, month):
    for doc in docs:
        doc.metadata["source"] = source
        doc.metadata["month"] = month
    return docs

lung_cancer_docs = add_metadata(lung_cancer_docs, "lung_cancer_doc", "June")
meta_docs = add_metadata(meta_docs, "meta_doc", "October")
amazon_docs = add_metadata(amazon_docs, "amazon_doc", "November")


# Create empty vector stores
lung_cancer_vectorstore = Chroma(
    collection_name="lung_cancer_collection",
    embedding_function=embeddings
)

earning_calls_vectorstore = Chroma(
    collection_name="earning_calls_collection",
    embedding_function=embeddings
)

RBAC authorization in RAG with Cerbos

You should use the RBAC approach to allow access to a RAG resource, such as a vector store, based on user role. For example, you want only the users with the role data_ingestor or admin to ingest data into a vector store.

Let's see how to do this with Cerbos.

The first step in implementing Cerbos authorization is to create policies that define the resource, the roles of the principals who can access it, and the actions that can be performed on it. You can also specify additional conditions that further define the scope of the principals who can perform an action on a resource.

You need to define policies in a .yaml and specify the directory containing the .yaml file while starting the Cerbos server. For example, if your policies are in cerbos-quickstart/policies/resource.document.yaml file, you will start your Cerbos server with the following command:

docker run --name cerbos -d -v $(pwd)/cerbos-quickstart/policies:/policies -p 3592:3592 -p 3593:3593 ghcr.io/cerbos/cerbos:0.39.0

The following script defines a policy for RBAC in RAG. The policy rules specify that the principals (which can be users) with roles data_ingestor and admin can perform an ingest action on the vector_store type resources.

apiVersion: "api.cerbos.dev/v1"
resourcePolicy:
  resource: "vector_store"
  version: "default"
  rules:
    - actions: ["ingest"]
      effect: EFFECT_ALLOW
      roles: ["data_ingestor", "admin"]

Next, we will define the ingest_data_with_rbac() function that accepts the vector store in which the data will be ingested, the documents to ingest, and the principal and resource objects.

The function checks if the principal can access the resource and ingests the data if the condition is evaluated as True.

Otherwise, the function prints a message that the principal cannot access the resource.

def ingest_data_with_rbac(vector_store, docs, principal, resource):
    
    with CerbosClient("localhost:3592", tls_verify=False) as client:
        
        if client.is_allowed("ingest", principal, resource):
            print(f"Access granted for {principal.id} to ingest data in resource {resource.id}.")
            vector_store.add_documents(docs)
            return True
            
        else:
            print(f"Access denied for {principal.id}.")
            return False

We will define three principals: user1, user2 and user3 with roles data_retriever, data_ingestor, and admin.

# Define Principals with different roles 
principal_user1 = engine_pb2.Principal(
    id="user1",
    roles=["data_retriever"],  
    policy_version= "default",
)


principal_user2 = engine_pb2.Principal(
    id="user2",
    roles=["data_ingestor"], 
    policy_version= "default",
)


principal_user3 = engine_pb2.Principal(
    id= "admin",
    roles=["admin"], 
    policy_version= "default",
)

We will define a resource lung_cancer_vector of type vector_store.

resource_rbac = engine_pb2.Resource(
    id="lung_cancer_vectorstore",
    kind="vector_store",
)

Finally, we will try to ingest data in the lung_cancer_vector using the three principal users and the resource we defined.

for principal in [principal_user1, principal_user2, principal_user3]:
    print("=====================")
    result = ingest_data_with_rbac(lung_cancer_vectorstore,
                                   lung_cancer_docs,
                                   principal, 
                                   resource_rbac)
    
    if result:
        print("Operation successfull - data ingested")
    else:
        print("You do not have permission to ingest the data")

Output:

output user 1 user 2 user 3.png

In the output, you will see that the user1 will not have access to the resource since it has the role of data_retriver. On the other hand user2, and user3 with roles data_ingestor and admin will be able to access the lung_cancer_vectorstore.

Next, you will see how to implement Attribute-Based Access Control(ABAC) on RAG with Cerbos.

ABAC authorization on RAG with Cerbos

ABAC approach is useful when you want to allow users with certain attributes to access a resource in RAG. For instance, if you want users from a specific department to ingest or retrieve data from a particular vector store, you can use the ABAC approach.

Let's see examples of ingestion and retrieval using the ABAC approach.

Data ingestion with ABAC

For ingestion with ABAC, let's add a rule that allows principals with department_data_ingestor roles to perform department_ingest on vector_store type resources if the department attribute of a principal matches the type attribute of a resource.

The following script defines the rule.

apiVersion: "api.cerbos.dev/v1"
resourcePolicy:
  resource: "vector_store"
  version: "default"
  rules:

    - actions: ["ingest"]
      effect: EFFECT_ALLOW
      roles: ["data_ingestor", "admin"]


    - actions: ["department_ingest"]
      effect: EFFECT_ALLOW
      roles: ["department_data_ingestor"]
      condition:
        match:
          expr: request.principal.attr.department == request.resource.attr.type

Next, we will define the ingest_data_with_abac() function, which accepts a vector store, documents, principal, and resource and ingests documents into the vector store only if a principal has access to the resource for the department_ingest action.

def ingest_data_with_abac(vector_store, docs, principal, resource):

    with CerbosClient("localhost:3592", tls_verify=False) as client:
        
        if client.is_allowed("department_ingest", principal, resource):
            print(f"Access granted for {principal.id} to ingest data in resource {resource.id}.")
            vector_store.add_documents(docs)
            return True
            
        else:
            print(f"Access denied for {principal.id}.")
            return False

Let's test the ingest_data_with_abac() function by creating two principal users: user4 and user5 with department_data_ingestor roles. The user4 belongs to the finance department, whereas the user5 belongs to the health department.

Similarly we will define two vector_store resources: finance_vectorstore and health with finance and health type attributes.

principal_user4 = engine_pb2.Principal(
    id="user4",
    roles=["department_data_ingestor"],
    policy_version= "default",
    attr={"department": Value(string_value="finance")}
)

principal_user5 = engine_pb2.Principal(
    id="user5",
    roles=["department_data_ingestor"],
    policy_version= "default",
    attr={"department": Value(string_value="health")}
)

resource_abac_finance = engine_pb2.Resource(
    id="finance_vectorstore",
    kind="vector_store",
    attr={"type": Value(string_value="finance")}
)

resource_abac_health = engine_pb2.Resource(
    id= "health",
    kind="vector_store",
    attr={"type": Value(string_value="health")}
)

We will first try to access the health type resource using user4 and user5 to ingest data in the lung_cancer_vectorstore.

for principal in [principal_user4, principal_user5]:
    print("=====================")
    result = ingest_data_with_abac(lung_cancer_vectorstore,
                                   lung_cancer_docs,
                                   principal, 
                                   resource_abac_health)
    
    if result:
        print("Operation successfull - data ingested")
    else:
        print("You do not have permission to ingest the data")

Output:

access denied for user 4.png

The above output shows that user4 from the finance department is denied access to the health type resource. On the other hand, user5 from the health department is able to access the resource and ingest data in the lung_cancer_vector store.

Let's see another example; this time user4 and user5 will try to access the finance type resource.

docs = meta_docs + amazon_docs
for principal in [principal_user4, principal_user5]:
    
    result = ingest_data_with_abac(earning_calls_vectorstore,
                                   docs,
                                   principal, 
                                   resource_abac_finance)
    
    if result:
        print("Operation successfull - data ingested")
    else:
        print("You do not have permission to ingest the data")

Output:

access granted for user 4.png

The output shows that user4 from the finance department accessed the resource finance department accessed the finance type resource, whereas user5 from the health department was denied access.

Data retrieval with ABAC

Data retrieval with ABAC involves retrieving data from a vector store using principal and resource attributes.

For example, we will add a new rule to our policy that allows principals with the role doc_retriever to perform a retrieve action if the doc_type attributes of the principal and resource match.

apmatchn: "api.cerbos.dev/v1"
resourcePolicy:
  resource: "vector_store"
  version: "default"
  rules:

    - actions: ["ingest"]
      effect: EFFECT_ALLOW
      roles: ["data_ingestor", "admin"]


    - actions: ["department_ingest"]
      effect: EFFECT_ALLOW
      roles: ["department_data_ingestor"]
      condition:
        match:
          expr: request.principal.attr.department == request.resource.attr.type
          
          
    - actions: ["retrieve"]
      effect: EFFECT_ALLOW
      roles: ["doc_retriever"]
      condition:
        match:
          expr: request.principal.attr.doc_type == request.resource.attr.doc_type

To test the above policy, we will first define a helper function, generate_response(), that returns an LLM response based on the user query and the doc type. For instance, if you pass the meta_doc doc type, the RAG will only look for documents with the meta_doc attribute in the vector store.

def generate_response(vector_store, query, doc_type) :

    prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:
    
    Question: {input}
    
    Context: {context}
    """
    )
    
    document_chain = create_stuff_documents_chain(llm, prompt)
    
    
    retriever = vector_store.as_retriever(search_kwargs={"filter": {"source": doc_type}})
    retrieval_chain = create_retrieval_chain(retriever, document_chain)
    
    response = retrieval_chain.invoke({"input": query})
    return response['answer']

Next, we will define the retrieve_data_with_abac() function, which allows only principals whose doc_type attribute matches the doc_type of the resource to retrieve RAG responses from the vector store.

Notice that here, in addition to the is_allowed() call, we retrieve the principal's plan using the plan_resource() API call. This method dynamically returns the principal's information for accessing a particular resource. We retrieve the string value of the only operand, doc_type, for the principal and use this information to filter the documents in the vector store.

def retrieve_data_with_rebac(vector_store, query, principal, resource, resource_plan):
    
    with CerbosClient("localhost:3592", tls_verify=False) as client:
        
        if client.is_allowed("retrieve", principal, resource):
            print(f"Access granted for {principal.id} to access the resource {resource.id}.")

            plan = client.plan_resources(action="retrieve", 
                     principal=principal, 
                     resource=plan_resource)

            doc_type = plan.filter.condition.expression.operands[0].value.string_value
            
            response = generate_response(vector_store, 
                                         query, 
                                         doc_type)
            return response
            
        else:
            return (f"Access denied for {principal.id} to access the resource {resource.id}.")

Let's test the ABAC policy for retrieval. We create two users: user6 and user7, with meta_doc and amazon_doc values for their doc_type attributes, respectively.

We also define two resources: meta_doc_vectorstore and amazon_doc_vectorstore, with their doc_type attributes containing meta_doc and amazon_doc values, respectively.

principal_user6 = engine_pb2.Principal(
    id="user6",
    roles=["doc_retriever"],
    policy_version= "default",
    attr={"doc_type": Value(string_value="meta_doc")}
)

principal_user7 = engine_pb2.Principal(
    id="user7",
    roles=["doc_retriever"],
    policy_version= "default",
    attr={"doc_type": Value(string_value="amazon_doc")}
)

resource_abac_meta = engine_pb2.Resource(
    id="meta_docs_vectorstore",
    kind="vector_store",
    attr={"doc_type": Value(string_value="meta_doc")}
)

resource_abac_amazon = engine_pb2.Resource(
    id="amazon_docs_vectorstore",
    kind="vector_store",
    attr={"doc_type": Value(string_value="amazon_doc")}
)

Let's also see what the resource plan returns for a the user7 to see what value will be passed in the doc_type attribute.

plan_resource = engine_pb2.PlanResourcesInput.Resource(
    kind="vector_store",
)

with CerbosClient("localhost:3592", tls_verify=False) as client:
    plan = plan = client.plan_resources(action="retrieve", 
                             principal=principal_user7, 
                             resource=plan_resource)

    print(plan)

Output:

request_id: "288b4689-215d-4baf-8b47-062d901be23a"
action: "retrieve"
resource_kind: "vector_store"
filter {
  kind: KIND_CONDITIONAL
  condition {
    expression {
      operator: "eq"
      operands {
        value {
          string_value: "amazon_doc"
        }
      }
      operands {
        variable: "request.resource.attr.doc_type"
      }
    }
  }
}
cerbos_call_id: "01JE10QSD53NF54Y18CCTKC92J"

The above output shows that for the user7 the doc_type is amazon_doc. This information will be dynamically retrieved for user7 in the retrieve_data_with_abac() function. The same follows for the other users and their doc types.

Let's try to access the meta_doc_vectorstore via user6 and user7.

or principal in [principal_user6, principal_user7]:
    print("=====================")
    query = "What is the revenue of Meta for Q3 2024?"
    doc_type = "meta_doc"
    
    result = retrieve_data_with_abac(earning_calls_vectorstore,
                                   query,
                                   doc_type,
                                   principal, 
                                   resource_abac_meta)
    
    print(result)

Output:

access granted for user 6.png

The output shows that user6 with the meta_doc attribute can access the meta_doc_vectorstore since its doc_type attribute matches.

On the other hand, user7 with the amazon_doc attribute is denied access.

On the contrary user7 will be allowed access to amazon_doc_vectorstore since its doc_type is amazon_doc.

for principal in [principal_user6, principal_user7]:
    print("=====================")
    query = "What is the revenue of Amazon for Q3 2024?"
    doc_type = "amazon_doc"
    
    result = retrieve_data_with_abac(earning_calls_vectorstore,
                                   query,
                                   doc_type,
                                   principal, 
                                   resource_abac_amazon)
    
    print(result)

Output:

access denied for user 6.png

With this approach, you can implement filters allowing users to access only certain documents within a single vector store.

Let's see another example of the ABAC where we combine two conditions to allow a principal to access a particular resource.

We will define a policy that allows the retrieve action to be performed by the principal with the role team_leader, where the principal's team name is equal to the resource's team name. We will add another condition using the && operator that specifies that the principal's doc_type must match the resource's doc_type.

apiVersion: "api.cerbos.dev/v1"
resourcePolicy:
  resource: "vector_store"
  version: "default"
  rules:

    - actions: ["ingest"]
      effect: EFFECT_ALLOW
      roles: ["data_ingestor", "admin"]


    - actions: ["department_ingest"]
      effect: EFFECT_ALLOW
      roles: ["department_data_ingestor"]
      condition:
        match:
          expr: request.principal.attr.department == request.resource.attr.type
          
          
    - actions: ["retrieve"]
      effect: EFFECT_ALLOW
      roles: ["doc_retriever"]
      condition:
        match:
          expr: request.principal.attr.doc_type == request.resource.attr.doc_type


    - actions: ["retrieve"]
      effect: EFFECT_ALLOW
      roles: ["team_leader"]
      condition:
        match:
          expr: request.principal.attr.team_name == request.resource.attr.team_name && request.principal.attr.doc_type == request.resource.attr.doc_type

Next, we will define a function retrieve_data_with_abac() that allows controlled access to documents in a vector store filtered by document type. We will again use the plan_resource that we defined earlier to dynamically retrieve a principal's doc_type attribute.

Note: The query plan is designed to adapt dynamically to policy changes. However, using a hardcoded value here can cause issues if the policy changes in the future.

def retrieve_data_with_abac(vector_store, query, principal, resource, resource_plan):
    
    with CerbosClient("localhost:3592", tls_verify=False) as client:
        
        if client.is_allowed("retrieve", principal, resource):
            print(f"Access granted for {principal.id} to access the resource {resource.id}.")

            plan = client.plan_resources(action="retrieve", 
                     principal=principal, 
                     resource=plan_resource)

            doc_type = plan.filter.condition.expression.operands[0].value.string_value
            
            response = generate_response(vector_store, 
                                         query, 
                                         doc_type)
            return response
            
        else:
            return (f"Access denied for {principal.id} to access the resource {resource.id}.")

To test the retrieve_data_with_rebac() function, we will define two principal users: user8 and user9.

The user8 is a team leader for the finance team and can access meta_doc documents. The user9 leads the health team and can access lung_cancer_doc.

We also define three resources: finance_vectorstore_meta with team_name=finance and doc_type=meta_doc, finace_vectorstore_amazon with team_name=finance and doc_type=amazon_doc, and health_vectorstore with team_name=health and doc_type=lung_cancer_doc.

principal_user8 = engine_pb2.Principal(
    id="user8",
    roles=["team_leader"],
    policy_version= "default",
    attr={"team_name": Value(string_value="finance"),
         "doc_type": Value(string_value="meta_doc")}
)

principal_user9 = engine_pb2.Principal(
    id="user9",
    roles=["team_leader"],
    policy_version= "default",
    attr={"team_name": Value(string_value="health"),
         "doc_type": Value(string_value="lung_cancer_doc")}
)

resource_rebac_finance_meta = engine_pb2.Resource(
    id="finance_vectorstore_meta",
    kind="vector_store",
    attr={"team_name": Value(string_value="finance"),
         "doc_type": Value(string_value="meta_doc")}
)

resource_rebac_finance_amazon = engine_pb2.Resource(
    id="finance_vectorstore_amazon",
    kind="vector_store",
    attr={"team_name": Value(string_value="finance"),
         "doc_type": Value(string_value="amazon_doc")}
)

resource_rebac_health = engine_pb2.Resource(
    id="health_vectorstore",
    kind="vector_store",
    attr={"team_name": Value(string_value="health"),
         "doc_type": Value(string_value="lung_cancer_doc")}
)

Next, we will access the finance_vectorstore_meta resource via the three users.

for principal in [principal_user8, principal_user9]:
    for resource in [resource_rebac_finance_meta, resource_rebac_finance_amazon]:
        print("=====================")
        query = "What is the revenue of Meta for Q3 2024?"
        doc_type = "meta_doc"
        
        result = retrieve_data_with_abac(earning_calls_vectorstore,
                                       query,
                                       doc_type,
                                       principal, 
                                       resource)
        
        print(result)

Output:

access granted for user 8.png

The output shows that only user8 could access the resource since its team_name and doc_type match the resource attributes.

So you can see that Cerbos makes it seamless to implement different types of authorization designs on RAG applications.

Final thoughts

Security and efficient access are paramount for RAG applications, as they may store sensitive and private data. Implementing a robust access control design improves data security, preventing malicious users or attackers from corrupting the data or accessing private information.

Various tools exist that allow you to implement access control mechanisms on RAG applications. Cerbos is one such tool that implements a highly flexible, scalable, and dynamic policy-based approach to access control. You can use Cerbos to implement various access control designs, such as RBAC and ABAC on your RAG applications, as you saw in this artice.

I have used Cerbos in my RAG applications, and I can confidently say it is one of the best and easiest-to-implement access control tools for RAG.