This article explains how to implement authorization systems for your Retrieval Augmented Generation apps.
Authorization systems allow controlled access to complete or partial software application features, limiting access to sensitive or specialized features based on user roles and permissions.
Authorization is particularly crucial in Retrieval Augmented Generation (RAG) applications as they involve ingesting data in vector databases. Effective authorization ensures that only authenticated and permitted users can ingest, retrieve, or manipulate data in vector databases.
In this article, you will explore the following concepts:
Before we begin exploring authorization approaches in RAG, let me introduce myself. I have been working with NLP techniques for the last 10 years, focusing on RAG since 2022. I have developed industry-grade RAG-based chatbots, virtual assistants, and language generation applications. In this article, you will learn RAG authorization approaches from someone with first-hand industry experience.
So, let's begin without further ado.
Retrieval Augmented Generation is an approach that enhances the capabilities of LLMs by augmenting their default knowledge using external sources.
A RAG system typically consists of the following components:
The following figure shows a typical RAG system.
A RAG application stores vector embeddings of text chunks generated using embedding models in a vector store. This process is called data ingestion. Subsequently, RAG converts new user inputs into vectors using the same embedding model as the one used for generating vectors for the vector store.
RAG then compares query vectors with the vectors in the vector and retrieves the vectors with the highest semantic similarity. The query and the text for the retrieved vectors are passed to the LLM to generate the final response. To make this concept more tangible, let's build a demo together.
In this section, you will learn how to develop a simple RAG system using the Python LangChain framework and the Chroma DB vector database.
You need to install the following Python libraries to run the following scripts:
!pip install -U langchain langchain-openai pypdf chromadb langchain_community
Subsequently, run the following script to import the required modules, classes, and functions into your Python application.
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import Chroma
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_core.documents import Document
from langchain_core.messages import HumanMessage, AIMessage
load_dotenv()
We will use the OpenAI GPT-4o LLM in our RAG application. The following script creates a ChatOpenAI
object as the LLM for our RAG application. You need OpenAI API client to use OpenAI LLM's in LangChain.
openai_key = os.environ.get('OPENAI_API_KEY')
llm = ChatOpenAI(
openai_api_key = openai_key ,
model = 'gpt-4',
temperature = 0.7
)
We will create two Chroma DB vector store objects. The first vector store will store data for Lung Cancer Awareness report, while the second vector store will store data from Meta and Amazon earnings report of Q3 2024.
We will ask our LLM questions related to these data sources. Since GPT-4o's knowledge cutoff date is October 2023, it cannot, by default, answer the questions related to Meta and Amazon's earnings report for 2024. However, with RAG, you will see that GPT-4o will respond to our queries related to these reports.
The following script imports data from these data sources using the PyPDFLoader
class and loads and splits them into multiple chunks using the load_and_split()
method.
data_url = "https://www.hse.ie/eng/services/list/5/cancer/pubs/reports/national-survey-on-lung-cancer-awareness-report-january-2020.pdf"
loader = PyPDFLoader(data_url)
lung_cancer_docs = loader.load_and_split()
data_url = "https://s21.q4cdn.com/399680738/files/doc_financials/2024/q3/META-Q3-2024-Earnings-Call-Transcript.pdf"
loader = PyPDFLoader(data_url)
meta_docs = loader.load_and_split()
data_url = "https://s2.q4cdn.com/299287126/files/doc_financials/2024/q3/AMZN-Q3-2024-Earnings-Release.pdf"
loader = PyPDFLoader(data_url)
amazon_docs = loader.load_and_split()
You can add information about a document using its metadata. In the following script, we add source and month information for all the documents.
def add_metadata(docs, source, month):
for doc in docs:
doc.metadata["source"] = source
doc.metadata["month"] = month
return docs
lung_cancer_docs = add_metadata(lung_cancer_docs, "lung_cancer_doc", "June")
meta_docs = add_metadata(meta_docs, "meta_doc", "October")
amazon_docs = add_metadata(amazon_docs, "amazon_doc", "November")
Metadata can help organize documents into different categories, which, as you will see in a later section, allows document filtering based on metadata.
Next, we will create two Chroma DB vector store objects. The first store will contain vectors related to the lung cancer awareness report, while the second store will contain vectors for the Meta and Amazon earnings reports.
embeddings = OpenAIEmbeddings(openai_api_key = openai_key)
lung_cancer_vectorstore= Chroma.from_documents(
documents=lung_cancer_docs,
embedding=embeddings,
collection_name="lung_cancer_collection"
)
earning_calls_vectorstore = Chroma.from_documents(
documents= meta_docs + amazon_docs,
embedding=embeddings,
collection_name="earning_calls_collection"
)
To generate a response from an LLM, we will first define a prompt that tells the LLM to only return responses based on the provided context. The context will be retrieved using a LangChain stuff document chain that stuffs documents retrieved from a vector store into a Prompt.
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:
Question: {input}
Context: {context}
"""
)
document_chain = create_stuff_documents_chain(llm, prompt)
Next, we will create a retrieval chain that uses the vector store retriever and the stuff document chain to generate a final model response.
In the script below, we create a retrieval chain that uses the lung_cancer_retriever
to generate responses using the lung cancer awareness report.
lung_cancer_retriever = lung_cancer_vectorstore.as_retriever()
lung_cancer_retrieval_chain = create_retrieval_chain(lung_cancer_retriever, document_chain)
Let's ask a few questions about the retrieval chain we created.
query = "What is the revenue generated by Meta in Q3 2024?"
response = lung_cancer_retrieval_chain.invoke({"input": query})
print(response["answer"])
Output
The context does not provide information on the revenue generated by Meta in Q3 2024.
Since we are using the lung cancer retriever, the retrieval chain did not answer the question related to Meta's revenue.
Let's ask a different question, this time related to lung cancer.
query = "What are the major causes of Lung Cancer?"
response = lung_cancer_retrieval_chain.invoke({"input": query})
print(response["answer"])
Output
The major causes of lung cancer include smoking, working environment, hereditary or genetic factors, air pollution, toxic chemicals, asbestos, second hand smoke, and alcohol. Other risk factors include environmental factors, poor diet, inhaling dust, lifestyle choices, previous cancer or other illnesses, obesity, unspecified pollution, vaping, lack of exercise, drugs, stress, and radon gas.
This time, you can see that the retrieval chain provided a response using the lung cancer awareness report.
Let's create a retrieval chain using the earnings call retriever and ask a question about Meta's revenue.
earning_calls_retriever = earning_calls_vectorstore.as_retriever()
earning_calls_retrieval_chain = create_retrieval_chain(earning_calls_retriever, document_chain)
query = "What is the revenue generated by Meta in Q3 2024?"
response = earning_calls_retrieval_chain.invoke({"input": query})
print(response["answer"])
Output
The revenue generated by Meta in Q3 2024 was $40.6 billion.
Since the earnings call retriever also contains information about Amazon, we can ask questions related to Amazon's revue.
query = "What is the revenue generated by Amazon?"
response = earning_calls_retrieval_chain.invoke({"input": query})
print(response["answer"])
Output
The revenue generated by Amazon in the third quarter of 2024 was $158.9 billion.```
Finally, you can create a retriever using a subset of documents in the vector store. For example, the following script creates a retriever object using only the documents where the metadata attribute source is meta_doc
.
If you ask questions about Amazon's revenue using this retriever, the model will not respond.
earning_calls_retriever = earning_calls_vectorstore.as_retriever(search_kwargs={"filter": {"source": "meta_doc"}})
earning_calls_retrieval_chain = create_retrieval_chain(earning_calls_retriever, document_chain)
query = "What is the revenue generated by Amazon in Q3 2024?"
response = earning_calls_retrieval_chain.invoke({"input": query})
print(response["answer"])
Output
The context does not provide information on the revenue generated by Amazon in Q3 2024.
With RAG, the possibilities are virtually endless. You can create all types of retrievers, add filters to the documents, and create retrieval chains suited to your needs. It's really powerful!
I personally prefer to work with RAG since it helps avoid fine-tuning an LLM, which can be costly and time-consuming. With RAG, you directly provide the answer to user queries to an LLM, whose job is then to formulate the answer and return the formulated response.
Following are some advantages of RAG applications.
However, RAG does come with certain limitations. Below are the challenges I encountered while working on my RAG projects.
Following are some of the main limitations of RAG:
Now that you know what and how RAG works, let's look at some of the security concerns for RAG applications.
While the RAG approach is compelling and innovative, it also introduced several security vulnerabilities that need to be carefully addressed before releasing RAG applications into production.
Here are some of the security concerns for RAG applications:
Prompt injection is a technique in which a hacker crafts a malicious prompt that prompts an LLM to return sensitive or unauthorized information.
For example, a RAG system may retrieve confidential financial data based on user queries. An attacker could submit a prompt like, "Provide all confidential data on financial projections," potentially forcing the system to retrieve sensitive data even without authorization.
Prompt injections are particularly dangerous for RAG systems. They can allow a malicious user to access sensitive information and cause a model to behave in an undesired manner.
Some common measures to prevent prompt injections involve implementing strict input validation and sanitization, using role-based access control to limit the scope of user queries, and employing prompt engineering techniques to make the system more resilient.
Context injection refers to an attacker inserting harmful data into a system's retrieval process, potentially causing RAG systems to produce responses based on corrupted information.
For example, an attacker may add a document containing malicious code that may be executed during retrieval.
Context injection results in two significant problems: misinformation propagation, where the model provides incorrect or misleading information due to the injected data, and malicious code execution, where unsanitized code runs within the system.
Preventive measures for context injection include strict validation and sanitization of content before ingestion and allowing only authorized users to add or modify context data.
Data poisoning occurs when attackers intentionally alter the knowledge base, compromising the reliability and quality of RAG system responses. One prominent example of data poisoning in NLP is the Tay, a chatbot introduced by Microsoft in 2016 that micks the speech patterns of a 19-year-old girl.
Malicious users bombarded Tay with offensive, abusive, and racial topics, poisoning its learning process. Consequently, Tay began replicating racist and explicit messages, highlighting the vulnerability of AI systems to data-poisoning attacks.
For example, an attacker may insert fabricated documents with biased information into the knowledge base, causing the model to reflect this bias in responses. Data poisoning can involve corrupting model training data or manipulating data vector representation.
Mitigating these risks involves implementing robust data validation and cleaning processes and using anomaly detection systems to identify suspicious data patterns.
Sensitive data exfiltration involves leaking sensitive data to malicious users. Poor access controls may allow attackers to query sensitive data directly, while inference attacks could enable attackers to deduce confidential information indirectly through carefully structured queries.
For instance, in a healthcare RAG system, a user could issue indirect requests to retrieve a patient's private information.
Effective prevention strategies include implementing fine-grained access control, data masking, and differential privacy techniques to obscure sensitive values and maintaining continuous monitoring and audit logs to detect potential exfiltration attempts.
Model inversion attacks involve reverse-engineering of sensitive data through model responses.
Attackers might repeatedly query the system to extract confidential information, such as customer data, by probing the model's responses. This risk is especially prevalent when personal data is embedded within the training set, leading to unintended privacy leaks.
Mitigation techniques for model inversion include using federated learning to limit raw training data exposure, applying data anonymization to remove identifying details, and regularly updating the model to reduce the effectiveness of inversion techniques.
The aforementioned security concerns highlight the importance of access control for AI applications, particularly for RAG.
Access control is crucial for RAG applications, especially when dealing with private or sensitive data. An effective access control mechanism ensures that only authorized and authenticated users can access specific RAG application resources.
Let's first discuss what authentication and authorization mean for AI applications (AI bots, AI companions and agents).
Authentication refers to verifying a user's identity to ensure that the person or system attempting to access the application is who they claim to be.
Some of the standard authentication techniques for user identification in AI applications include:
Password authentication, where a user's identity is verified via a password or user name. It is the most basic form of authentication but is risky if not robustly implemented.
Mult-factory authentication, which combines two or more authentication factors (e.g., password and biometric, text message and email, etc.).
OAuth/OpenID connect. This approach uses third-party services for user authentication, such as Google OAuth and OAuth 2.0.
Authorization is different from authentication in that authorization determines the actions that an authenticated user can perform. For RAG applications, authorization involves controlling access to features such as data ingestion, retrieval, and vector store manipulation.
Standard authorization approaches for AI and RAG applications include 3 key concepts:
Defining roles and permissions: Clearly specify user roles, e.g. admin
, data-ingestor
, data-viewer
, data-retriever
, and the associated resources they can access.
Implement fine-grained access control: Implement policies restricting access at a granular level. For example, a finance-retriever
role can access only finance-related vectors from the vector store; a health-retriever
role may access health-related documents from a vector database.
Use third-party authorization tools: Use third-party authorization tools such as Cerbos that offer out of the box authorization functionalities.
The upcoming section discusses some common authorization designs for AI applications, which are also applicable to RAG applications.
Depending on the complexity and requirements of your RAG system, you can adopt one or more of the following authorization designs.
ACLs are one of the simplest application authorization designs. In ACLs, you define a list of users who can access one or multiple resources, independent of the users' role. Anyone on the list can access the specified resources. An example of an ACL in RAG can be a list of users who can ingest data into a health
data store.
Role-based access control assigns permissions to roles rather than users. Users with the assigned roles can access resources. Users and roles can have a many-to-many relation, where a role can be assigned to multiple users, and a user can have one or multiple roles.
For example, all users with the data-ingestor
role can ingest data into a vector database.
ABAC allows access to resources based on user and resource attributes. For example you can use ABAC to allow users with department=finance
to access resources where vector-store = finance
.
This fine-grained access control ensures that only the right people can access the right resources under the right conditions, making ABAC particularly suitable for complex systems like RAG.
ReBAC allows access to resources based on the relationship between entities (users, resources, roles).
For example, a team lead can access all documents created by their team members but not those from other teams. In such a case, ReBAC will check who created the document, and if the user who created the document is part of a team leader's team, access will be granted to the team leader.
The ReBac approach benefits collaborative RAG applications with dynamic data ownership and relationships.
Tools like Cerbos can help you seamlessly implement the access control approaches in your RAG applications. The following section will discuss why authorization is critical for RAG applications.
As discussed earlier, RAG systems deal with sensitive and private data ingestion, retrieval, and manipulation. Without a robust authorization system, RAG applications become vulnerable to unauthorized access, data breaches, and unintended misuse. Following are some factors that highlight the importance of authorization in RAG application.
Unauthorized access to private and sensitive data in RAG applications may leak to sensitive data leakage, which a malicious user may exploit.
For example, unauthorized access to a company's private earnings data may help competitors develop strategies that can result in financial loss for the company. Implementing ACL or RBAC can help avoid unauthorized data access.
Many industries, such as finance and health care, are governed by strict regulations, such as GDPR and HIPAA. Without secure authorization, RAG applications risk non-compliance with these regulations.
Data integrity is essential to ensuring correct responses in RAG applications. Unauthorized access to vector databases in RAG applications allows malicious users to inject factually wrong or biased information into the database. This results in incorrect and often biased responses from RAG applications.
RAG applications with robust authorization foster user trust. Users are more likely to trust applications that demonstrate that user data is handled securely and robustly. For example, a collaborative RAG system for academic research that restricts data ingestion and retrieval based on user roles, e.g., students and faculty, fosters trust among researchers who know that data is ingested by faculty members rather than students.
The authorization mechanism in RAG applications allows for better tracking and logging of user actions. This is critical for identifying potential data breaches and maintaining accountability.
With authorization, you will have a record of all the actions performed by various users, which will help you identify malicious users in case of data breaches and unauthorized access.
Fortunately, all of the aforementioned issues can be handled by using Cerbos, a robust authorization layer that you can use to implement access control in your RAG authorization.
Cerbos is an open-source, language-agnostic authorization layer that provides a powerful solution for implementing authorization in modern, distributed applications. It offers improved security, scalability, and ease of management for access control policies.
Cerbos implements authorization policies using a declarative language, decoupling the authorization logic from your application. It is highly scalable and efficiently handles high-volume authorization requests.
In this section, you will see how to use Cerbos to implement various authorization designs such as RBAC and ABAC on the RAG applications we developed in the first section.
To use Cerbos authorization, you need to install and run the Cerbos server. You can run Cerbos server via Docker as explained in the official documentation.
You will also need to install the Cerbos Python SDK to call the Cerbos server from a Python application.
pip install cerbos
Next, import the following libraries into your Python application.
from cerbos.sdk.grpc.client import CerbosClient
from cerbos.engine.v1 import engine_pb2
from google.protobuf.struct_pb2 import Value
Note: You must also import the Python libraries from the examples in section 1 in the article.
The following script creates a Cerbos client.
The script also creates the OpenAI API client that we will use to generate vector embeddings and to call OpenAI LLMs in our RAG application.
Finally, we create OpenAIEmbeddings
object that we will use to create vector embeddings to store in vector stores.
load_dotenv()
# Cerbos Client Initialization
cerbos_client = CerbosClient("localhost:3593", tls_verify=False)
# OpenAI Client Initialization
openai_key = os.environ.get('OPENAI_API_KEY')
llm = ChatOpenAI(
openai_api_key=openai_key,
model="gpt-4",
temperature=0.7
)
# Initialize OpenAI embeddings
embeddings = OpenAIEmbeddings(openai_api_key=openai_key)
We will create two vector stores: one for the lung cancer survey document and the other for Meta and Amazon earning calls.
data_url = "https://www.hse.ie/eng/services/list/5/cancer/pubs/reports/national-survey-on-lung-cancer-awareness-report-january-2020.pdf"
loader = PyPDFLoader(data_url)
lung_cancer_docs = loader.load_and_split()
data_url = "https://s21.q4cdn.com/399680738/files/doc_financials/2024/q3/META-Q3-2024-Earnings-Call-Transcript.pdf"
loader = PyPDFLoader(data_url)
meta_docs = loader.load_and_split()
data_url = "https://s2.q4cdn.com/299287126/files/doc_financials/2024/q3/AMZN-Q3-2024-Earnings-Release.pdf"
loader = PyPDFLoader(data_url)
amazon_docs = loader.load_and_split()
def add_metadata(docs, source, month):
for doc in docs:
doc.metadata["source"] = source
doc.metadata["month"] = month
return docs
lung_cancer_docs = add_metadata(lung_cancer_docs, "lung_cancer_doc", "June")
meta_docs = add_metadata(meta_docs, "meta_doc", "October")
amazon_docs = add_metadata(amazon_docs, "amazon_doc", "November")
# Create empty vector stores
lung_cancer_vectorstore = Chroma(
collection_name="lung_cancer_collection",
embedding_function=embeddings
)
earning_calls_vectorstore = Chroma(
collection_name="earning_calls_collection",
embedding_function=embeddings
)
You should use the RBAC approach to allow access to a RAG resource, such as a vector store, based on user role. For example, you want only the users with the role data_ingestor
or admin
to ingest data into a vector store.
Let's see how to do this with Cerbos.
The first step in implementing Cerbos authorization is to create policies that define the resource, the roles of the principals who can access it, and the actions that can be performed on it. You can also specify additional conditions that further define the scope of the principals who can perform an action on a resource.
You need to define policies in a .yaml
and specify the directory containing the .yaml
file while starting the Cerbos server. For example, if your policies are in cerbos-quickstart/policies/resource.document.yaml
file, you will start your Cerbos server with the following command:
docker run --name cerbos -d -v $(pwd)/cerbos-quickstart/policies:/policies -p 3592:3592 -p 3593:3593 ghcr.io/cerbos/cerbos:0.39.0
The following script defines a policy for RBAC in RAG. The policy rules specify that the principals (which can be users) with roles data_ingestor
and admin
can perform an ingest
action on the vector_store
type resources.
apiVersion: "api.cerbos.dev/v1"
resourcePolicy:
resource: "vector_store"
version: "default"
rules:
- actions: ["ingest"]
effect: EFFECT_ALLOW
roles: ["data_ingestor", "admin"]
Next, we will define the ingest_data_with_rbac()
function that accepts the vector store in which the data will be ingested, the documents to ingest, and the principal and resource objects.
The function checks if the principal can access the resource and ingests the data if the condition is evaluated as True
.
Otherwise, the function prints a message that the principal cannot access the resource.
def ingest_data_with_rbac(vector_store, docs, principal, resource):
with CerbosClient("localhost:3592", tls_verify=False) as client:
if client.is_allowed("ingest", principal, resource):
print(f"Access granted for {principal.id} to ingest data in resource {resource.id}.")
vector_store.add_documents(docs)
return True
else:
print(f"Access denied for {principal.id}.")
return False
We will define three principals: user1
, user2
and user3
with roles data_retriever
, data_ingestor
, and admin
.
# Define Principals with different roles
principal_user1 = engine_pb2.Principal(
id="user1",
roles=["data_retriever"],
policy_version= "default",
)
principal_user2 = engine_pb2.Principal(
id="user2",
roles=["data_ingestor"],
policy_version= "default",
)
principal_user3 = engine_pb2.Principal(
id= "admin",
roles=["admin"],
policy_version= "default",
)
We will define a resource lung_cancer_vector
of type vector_store
.
resource_rbac = engine_pb2.Resource(
id="lung_cancer_vectorstore",
kind="vector_store",
)
Finally, we will try to ingest data in the lung_cancer_vector
using the three principal users and the resource we defined.
for principal in [principal_user1, principal_user2, principal_user3]:
print("=====================")
result = ingest_data_with_rbac(lung_cancer_vectorstore,
lung_cancer_docs,
principal,
resource_rbac)
if result:
print("Operation successfull - data ingested")
else:
print("You do not have permission to ingest the data")
Output:
In the output, you will see that the user1
will not have access to the resource since it has the role of data_retriver
. On the other hand user2
, and user3
with roles data_ingestor
and admin
will be able to access the lung_cancer_vectorstore
.
Next, you will see how to implement Attribute-Based Access Control(ABAC) on RAG with Cerbos.
ABAC approach is useful when you want to allow users with certain attributes to access a resource in RAG. For instance, if you want users from a specific department to ingest or retrieve data from a particular vector store, you can use the ABAC approach.
Let's see examples of ingestion and retrieval using the ABAC approach.
For ingestion with ABAC, let's add a rule that allows principals with department_data_ingestor
roles to perform department_ingest
on vector_store
type resources if the department
attribute of a principal matches the type
attribute of a resource.
The following script defines the rule.
apiVersion: "api.cerbos.dev/v1"
resourcePolicy:
resource: "vector_store"
version: "default"
rules:
- actions: ["ingest"]
effect: EFFECT_ALLOW
roles: ["data_ingestor", "admin"]
- actions: ["department_ingest"]
effect: EFFECT_ALLOW
roles: ["department_data_ingestor"]
condition:
match:
expr: request.principal.attr.department == request.resource.attr.type
Next, we will define the ingest_data_with_abac()
function, which accepts a vector store, documents, principal, and resource and ingests documents into the vector store only if a principal has access to the resource for the department_ingest
action.
def ingest_data_with_abac(vector_store, docs, principal, resource):
with CerbosClient("localhost:3592", tls_verify=False) as client:
if client.is_allowed("department_ingest", principal, resource):
print(f"Access granted for {principal.id} to ingest data in resource {resource.id}.")
vector_store.add_documents(docs)
return True
else:
print(f"Access denied for {principal.id}.")
return False
Let's test the ingest_data_with_abac()
function by creating two principal users: user4
and user5
with department_data_ingestor
roles. The user4
belongs to the finance
department, whereas the user5
belongs to the health
department.
Similarly we will define two vector_store
resources: finance_vectorstore
and health
with finance
and health
type attributes.
principal_user4 = engine_pb2.Principal(
id="user4",
roles=["department_data_ingestor"],
policy_version= "default",
attr={"department": Value(string_value="finance")}
)
principal_user5 = engine_pb2.Principal(
id="user5",
roles=["department_data_ingestor"],
policy_version= "default",
attr={"department": Value(string_value="health")}
)
resource_abac_finance = engine_pb2.Resource(
id="finance_vectorstore",
kind="vector_store",
attr={"type": Value(string_value="finance")}
)
resource_abac_health = engine_pb2.Resource(
id= "health",
kind="vector_store",
attr={"type": Value(string_value="health")}
)
We will first try to access the health type resource using user4 and user5 to ingest data in the lung_cancer_vectorstore
.
for principal in [principal_user4, principal_user5]:
print("=====================")
result = ingest_data_with_abac(lung_cancer_vectorstore,
lung_cancer_docs,
principal,
resource_abac_health)
if result:
print("Operation successfull - data ingested")
else:
print("You do not have permission to ingest the data")
Output:
The above output shows that user4
from the finance
department is denied access to the health
type resource. On the other hand, user5
from the health
department is able to access the resource and ingest data in the lung_cancer_vector store
.
Let's see another example; this time user4
and user5
will try to access the finance type resource.
docs = meta_docs + amazon_docs
for principal in [principal_user4, principal_user5]:
result = ingest_data_with_abac(earning_calls_vectorstore,
docs,
principal,
resource_abac_finance)
if result:
print("Operation successfull - data ingested")
else:
print("You do not have permission to ingest the data")
Output:
The output shows that user4
from the finance
department accessed the resource finance
department accessed the finance
type resource, whereas user5
from the health
department was denied access.
Data retrieval with ABAC involves retrieving data from a vector store using principal and resource attributes.
For example, we will add a new rule to our policy that allows principals with the role doc_retriever
to perform a retrieve
action if the doc_type
attributes of the principal and resource match.
apmatchn: "api.cerbos.dev/v1"
resourcePolicy:
resource: "vector_store"
version: "default"
rules:
- actions: ["ingest"]
effect: EFFECT_ALLOW
roles: ["data_ingestor", "admin"]
- actions: ["department_ingest"]
effect: EFFECT_ALLOW
roles: ["department_data_ingestor"]
condition:
match:
expr: request.principal.attr.department == request.resource.attr.type
- actions: ["retrieve"]
effect: EFFECT_ALLOW
roles: ["doc_retriever"]
condition:
match:
expr: request.principal.attr.doc_type == request.resource.attr.doc_type
To test the above policy, we will first define a helper function, generate_response()
, that returns an LLM response based on the user query and the doc type. For instance, if you pass the meta_doc
doc type, the RAG will only look for documents with the meta_doc
attribute in the vector store.
def generate_response(vector_store, query, doc_type) :
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:
Question: {input}
Context: {context}
"""
)
document_chain = create_stuff_documents_chain(llm, prompt)
retriever = vector_store.as_retriever(search_kwargs={"filter": {"source": doc_type}})
retrieval_chain = create_retrieval_chain(retriever, document_chain)
response = retrieval_chain.invoke({"input": query})
return response['answer']
Next, we will define the retrieve_data_with_abac()
function, which allows only principals whose doc_type
attribute matches the doc_type
of the resource to retrieve RAG responses from the vector store.
Notice that here, in addition to the is_allowed()
call, we retrieve the principal's plan using the plan_resource()
API call. This method dynamically returns the principal's information for accessing a particular resource. We retrieve the string value of the only operand, doc_type
, for the principal and use this information to filter the documents in the vector store.
def retrieve_data_with_rebac(vector_store, query, principal, resource, resource_plan):
with CerbosClient("localhost:3592", tls_verify=False) as client:
if client.is_allowed("retrieve", principal, resource):
print(f"Access granted for {principal.id} to access the resource {resource.id}.")
plan = client.plan_resources(action="retrieve",
principal=principal,
resource=plan_resource)
doc_type = plan.filter.condition.expression.operands[0].value.string_value
response = generate_response(vector_store,
query,
doc_type)
return response
else:
return (f"Access denied for {principal.id} to access the resource {resource.id}.")
Let's test the ABAC policy for retrieval. We create two users: user6
and user7
, with meta_doc
and amazon_doc
values for their doc_type
attributes, respectively.
We also define two resources: meta_doc_vectorstore
and amazon_doc_vectorstore
, with their doc_type
attributes containing meta_doc
and amazon_doc
values, respectively.
principal_user6 = engine_pb2.Principal(
id="user6",
roles=["doc_retriever"],
policy_version= "default",
attr={"doc_type": Value(string_value="meta_doc")}
)
principal_user7 = engine_pb2.Principal(
id="user7",
roles=["doc_retriever"],
policy_version= "default",
attr={"doc_type": Value(string_value="amazon_doc")}
)
resource_abac_meta = engine_pb2.Resource(
id="meta_docs_vectorstore",
kind="vector_store",
attr={"doc_type": Value(string_value="meta_doc")}
)
resource_abac_amazon = engine_pb2.Resource(
id="amazon_docs_vectorstore",
kind="vector_store",
attr={"doc_type": Value(string_value="amazon_doc")}
)
Let's also see what the resource plan returns for a the user7
to see what value will be passed in the doc_type
attribute.
plan_resource = engine_pb2.PlanResourcesInput.Resource(
kind="vector_store",
)
with CerbosClient("localhost:3592", tls_verify=False) as client:
plan = plan = client.plan_resources(action="retrieve",
principal=principal_user7,
resource=plan_resource)
print(plan)
Output:
request_id: "288b4689-215d-4baf-8b47-062d901be23a"
action: "retrieve"
resource_kind: "vector_store"
filter {
kind: KIND_CONDITIONAL
condition {
expression {
operator: "eq"
operands {
value {
string_value: "amazon_doc"
}
}
operands {
variable: "request.resource.attr.doc_type"
}
}
}
}
cerbos_call_id: "01JE10QSD53NF54Y18CCTKC92J"
The above output shows that for the user7
the doc_type
is amazon_doc
. This information will be dynamically retrieved for user7
in the retrieve_data_with_abac()
function. The same follows for the other users and their doc types.
Let's try to access the meta_doc_vectorstore
via user6
and user7
.
or principal in [principal_user6, principal_user7]:
print("=====================")
query = "What is the revenue of Meta for Q3 2024?"
doc_type = "meta_doc"
result = retrieve_data_with_abac(earning_calls_vectorstore,
query,
doc_type,
principal,
resource_abac_meta)
print(result)
Output:
The output shows that user6
with the meta_doc
attribute can access the meta_doc_vectorstore
since its doc_type
attribute matches.
On the other hand, user7
with the amazon_doc
attribute is denied access.
On the contrary user7
will be allowed access to amazon_doc_vectorstore
since its doc_type
is amazon_doc
.
for principal in [principal_user6, principal_user7]:
print("=====================")
query = "What is the revenue of Amazon for Q3 2024?"
doc_type = "amazon_doc"
result = retrieve_data_with_abac(earning_calls_vectorstore,
query,
doc_type,
principal,
resource_abac_amazon)
print(result)
Output:
With this approach, you can implement filters allowing users to access only certain documents within a single vector store.
Let's see another example of the ABAC where we combine two conditions to allow a principal to access a particular resource.
We will define a policy that allows the retrieve
action to be performed by the principal with the role team_leader
, where the principal's team name is equal to the resource's team name. We will add another condition using the &&
operator that specifies that the principal's doc_type
must match the resource's doc_type
.
apiVersion: "api.cerbos.dev/v1"
resourcePolicy:
resource: "vector_store"
version: "default"
rules:
- actions: ["ingest"]
effect: EFFECT_ALLOW
roles: ["data_ingestor", "admin"]
- actions: ["department_ingest"]
effect: EFFECT_ALLOW
roles: ["department_data_ingestor"]
condition:
match:
expr: request.principal.attr.department == request.resource.attr.type
- actions: ["retrieve"]
effect: EFFECT_ALLOW
roles: ["doc_retriever"]
condition:
match:
expr: request.principal.attr.doc_type == request.resource.attr.doc_type
- actions: ["retrieve"]
effect: EFFECT_ALLOW
roles: ["team_leader"]
condition:
match:
expr: request.principal.attr.team_name == request.resource.attr.team_name && request.principal.attr.doc_type == request.resource.attr.doc_type
Next, we will define a function retrieve_data_with_abac()
that allows controlled access to documents in a vector store filtered by document type. We will again use the plan_resource
that we defined earlier to dynamically retrieve a principal's doc_type
attribute.
Note: The query plan is designed to adapt dynamically to policy changes. However, using a hardcoded value here can cause issues if the policy changes in the future.
def retrieve_data_with_abac(vector_store, query, principal, resource, resource_plan):
with CerbosClient("localhost:3592", tls_verify=False) as client:
if client.is_allowed("retrieve", principal, resource):
print(f"Access granted for {principal.id} to access the resource {resource.id}.")
plan = client.plan_resources(action="retrieve",
principal=principal,
resource=plan_resource)
doc_type = plan.filter.condition.expression.operands[0].value.string_value
response = generate_response(vector_store,
query,
doc_type)
return response
else:
return (f"Access denied for {principal.id} to access the resource {resource.id}.")
To test the retrieve_data_with_rebac()
function, we will define two principal users: user8
and user9
.
The user8
is a team leader for the finance
team and can access meta_doc
documents. The user9
leads the health
team and can access lung_cancer_doc
.
We also define three resources: finance_vectorstore_meta
with team_name=finance
and doc_type=meta_doc
, finace_vectorstore_amazon
with team_name=finance
and doc_type=amazon_doc
, and health_vectorstore
with team_name=health
and doc_type=lung_cancer_doc
.
principal_user8 = engine_pb2.Principal(
id="user8",
roles=["team_leader"],
policy_version= "default",
attr={"team_name": Value(string_value="finance"),
"doc_type": Value(string_value="meta_doc")}
)
principal_user9 = engine_pb2.Principal(
id="user9",
roles=["team_leader"],
policy_version= "default",
attr={"team_name": Value(string_value="health"),
"doc_type": Value(string_value="lung_cancer_doc")}
)
resource_rebac_finance_meta = engine_pb2.Resource(
id="finance_vectorstore_meta",
kind="vector_store",
attr={"team_name": Value(string_value="finance"),
"doc_type": Value(string_value="meta_doc")}
)
resource_rebac_finance_amazon = engine_pb2.Resource(
id="finance_vectorstore_amazon",
kind="vector_store",
attr={"team_name": Value(string_value="finance"),
"doc_type": Value(string_value="amazon_doc")}
)
resource_rebac_health = engine_pb2.Resource(
id="health_vectorstore",
kind="vector_store",
attr={"team_name": Value(string_value="health"),
"doc_type": Value(string_value="lung_cancer_doc")}
)
Next, we will access the finance_vectorstore_meta
resource via the three users.
for principal in [principal_user8, principal_user9]:
for resource in [resource_rebac_finance_meta, resource_rebac_finance_amazon]:
print("=====================")
query = "What is the revenue of Meta for Q3 2024?"
doc_type = "meta_doc"
result = retrieve_data_with_abac(earning_calls_vectorstore,
query,
doc_type,
principal,
resource)
print(result)
Output:
The output shows that only user8
could access the resource since its team_name
and doc_type
match the resource attributes.
So you can see that Cerbos makes it seamless to implement different types of authorization designs on RAG applications.
Security and efficient access are paramount for RAG applications, as they may store sensitive and private data. Implementing a robust access control design improves data security, preventing malicious users or attackers from corrupting the data or accessing private information.
Various tools exist that allow you to implement access control mechanisms on RAG applications. Cerbos is one such tool that implements a highly flexible, scalable, and dynamic policy-based approach to access control. You can use Cerbos to implement various access control designs, such as RBAC and ABAC on your RAG applications, as you saw in this artice.
I have used Cerbos in my RAG applications, and I can confidently say it is one of the best and easiest-to-implement access control tools for RAG.
Book a free Policy Workshop to discuss your requirements and get your first policy written by the Cerbos team
Join thousands of developers | Features and updates | 1x per month | No spam, just goodies.