Empowering Data Security in GenAI: Step-by-Step Guide to PII Safeguarding in Bedrock using Protegrity

By Muneeb Haasan

Feb 24, 2025

Summary

5 min

This blog outlines how to safeguard Personally Identifiable Information (PII) in Generative AI (GenAI) pipelines using Amazon Bedrock and Protegrity GenAI Security. It covers data tokenization, policy-driven access controls, and PII discovery via Protegrity’s API Playground, ensuring secure and compliant AI workflows. The post also includes a step-by-step code guide to implementing these protections effectively.

Introduction

Generative AI (GenAI) applications, especially through Retrieval-Augmented Generation (RAG) pipelines, are transforming business interactions with data. These pipelines leverage language models and extensive enterprise knowledge bases for real-time queries of large internal datasets. Robust data privacy and security solutions are essential. Amazon Bedrock’s native security guardrails address this need. Protegrity, known for protecting sensitive data in regulated industries, is enhancing these capabilities for Bedrock users with Protegrity GenAI Security, augmenting Bedrock’s security framework.

Amazon Bedrock

Amazon Bedrock is a robust, fully managed service that enables developers to build, train, and deploy machine learning models at scale, accelerating AI initiatives while minimizing infrastructure management complexity. Bedrock offers integrated tools and frameworks to streamline the machine learning lifecycle, ensuring efficient and effective model development.

Bedrock supports various machine learning frameworks, including TensorFlow, PyTorch, and Apache MXNet, allowing developers to work in their preferred environments. With its flexibility and scalability, organizations can focus on innovation and deriving insights without worrying about the underlying infrastructure, ultimately advancing their AI capabilities and driving business outcomes.

Protegrity for Amazon Bedrock

Protegrity provides data tokenization for Amazon Bedrock using a cloud-native, serverless architecture. This solution seamlessly scales to meet Bedrock’s on-demand, intensive machine learning workloads, ensuring data security while delivering the performance needed to protect sensitive data at scale.

Amazon Bedrock automatically provisions and intelligently scales machine learning infrastructure for fast and efficient performance, even for demanding workloads. With Bedrock, you pay only for what you use, allowing you to start training your models immediately in your preferred machine learning framework. Protegrity’s tokenization solution supports both Bedrock’s serverless and provisioned environments, offering robust data protection and seamless integration.

Step-by-Step Guide

On this notebook, we’ll demonstrate how Personally identifiable information (PII) can safely traverse several integration points of a typical Gen AI pipeline. Given that Retrieval-Augmented Generation (RAG) is a common implementation pattern, we’ll follow what a likely scenario would look like. We’ll make use of “Protegrity GenAI Security” product to achieve this purpose through Protegrity’s API Playground. To register and use, please follow the instructions available here: https://www.protegrity.com/api-playground. This product gives you 2 key capabilities, namely, PII discovery on unstructured data and reversible tokenization. In the following diagram, we depict 2 design patterns that demonstrate how to protect a Gen AI pipeline.

The pattern on the right entails that a company needs to protect data immediately at the vector database level. Given that Protegrity’s tokenization technology ensures referential integrity, semantic search to retrieve relevant context for a user query will operate on tokenized data. Protected data flows through the system and before returning the output to the user, there is a policy verification step that determines if the user is allowed to see data in the clear or if it should be provided with masked or tokenized data.


from IPython.display import Image
Image(filename="Protegrity GenAI Security and AWS Bedrock.PNG")

The pattern on the left displays a less complex pattern that operates on top of unprotected data, but ensures that access to data is done via policy and fine-grained access controls.

Protegrity’s GenAI Security can be used on these and many more design patterns. Below, we’ll take a deep dive into the second pattern.

Note: The highlighted part is the same as the screen shots, provided for easy copy.

Imports


import json
import requests
import boto3

Setup

Bedrock Specific


boto3_session = boto3.session.Session()
region = boto3_session.region_name
# Fill in with your specific resources.
# Beware of model changes that can impact # bedrock-agent-runtime "retrieve()" and
# bedrock-runtime "invoke_model()" compatibility.

model_id = "anthropic.claude-v2:1"
knowledge_base_id = "1LH4NYCSPO"

bedrockClient = boto3.client("bedrock-agent-runtime", region)
bedrockRuntimeClient = boto3.client("bedrock-runtime", region)

Protegrity Specific

For this demonstration, we’ll take advantage of Protegrity’s API Playground, where you can sign-up for access to data protection mechanisms and the capabilities of Protegrity’s GenAI Security: https://www.protegrity.com/api-playground. After following the instructions on this website, you’ll be able to automatically find and protect common PII elements.

Workflow


question = "Which 5 customers fraud claims are the most urgent? Give me details about them."

knowledgeBaseResponse = bedrockClient.retrieve(
    knowledgeBaseId=knowledge_base_id, retrievalQuery={"text": question}
)

Bedrock provides capability to directly retrieve and generate given a knowledge base, but we’ll separate those calls in order to point that one could catch and protect PII at several layers, depending on the trust model.


context = ""
for source in knowledgeBaseResponse["retrievalResults"]:
    context += f"\n {source['content']['text']}"

print(context)

Kathryn Lewis, passport number 692064153, address 401 Sue Mountains Ericbury, SC 43316, confirmed at 2024-11-20 submitted a fraud claim regarding the following transactions: {‘step’: 566, ‘type’: ‘CASH_OUT’, ‘amount’: 125330.8, ‘nameOrig’: ‘C126286172’, ‘oldbalanceOrg’: 29855.0, ‘newbalanceOrig’: 0.0, ‘nameDest’: ‘C1824943700’, ‘oldbalanceDest’: 572667.73, ‘newbalanceDest’: 697998.53, ‘isFraud’: 0, ‘isFlaggedFraud’: 0}. As of 2024-03-10 our analysts have confirmed the fraud status to be False.

Andrea Crawford, passport number 907921576, address 9855 Jodi Motorway Burnsmouth, DC 46046, confirmed at 2024-09-11 submitted a fraud claim regarding the following transactions: {‘step’: 322, ‘type’: ‘CASH_OUT’, ‘amount’: 472300.69, ‘nameOrig’: ‘C1246967028’, ‘oldbalanceOrg’: 109467.0, ‘newbalanceOrig’: 0.0, ‘nameDest’: ‘C884656820’, ‘oldbalanceDest’: 2207990.29, ‘newbalanceDest’: 2680290.98, ‘isFraud’: 0, ‘isFlaggedFraud’: 0}. As of 2024-05-30 our analysts have confirmed the fraud status to be True. “

James Rice, passport number 740201502, address 5282 Kevin Shore Suite 131 New Laurie, IL 51286, confirmed at 2024-06-12 submitted a fraud claim regarding the following transactions: {‘step’: 330, ‘type’: ‘CASH_OUT’, ‘amount’: 289112.28, ‘nameOrig’: ‘C1711788174’, ‘oldbalanceOrg’: 0.0, ‘newbalanceOrig’: 0.0, ‘nameDest’: ‘C1398752802’, ‘oldbalanceDest’: 1690770.17, ‘newbalanceDest’: 1979882.44, ‘isFraud’: 0, ‘isFlaggedFraud’: 0}. As of 2024-02-13 our analysts have confirmed the fraud status to be False.

Anthony Mueller, passportnumber U14754162, address 22845 Brianna Circles Patrickview, IN 65960, confirmed at 2024-11-11 submitted a fraud claim regarding the following transactions: {‘step’: 17, ‘type’: ‘CASH_IN’, ‘amount’: 238654.53, ‘nameOrig’: ‘C104256125’, ‘oldbalanceOrg’: 8243295.8, ‘newbalanceOrig’: 8481950.33, ‘nameDest’: ‘C1813750268’, ‘oldbalanceDest’: 303232.14, ‘newbalanceDest’: 64577.61, ‘isFraud’: 0, ‘isFlaggedFraud’: 0}. As of 2024-11-13 our analysts have confirmed the fraud status to be True.

Michael Riley, passport number 661543207, address 4977 Anthony Knolls New Brandon, MD 44876, confirmed at 2024-03- 24 submitted a fraud claim regarding the following transactions: {‘step’: 398, ‘type’: ‘CASH_IN’, ‘amount’: 307596.83, ‘nameOrig’: ‘C628114319’, ‘oldbalanceOrg’: 3529802.15, ‘newbalanceOrig’: 3837398.99, ‘nameDest’: ‘C1619891381’, ‘oldbalanceDest’: 1211127.66, ‘newbalanceDest’: 903530.83, ‘isFraud’: 0, ‘isFlaggedFraud’: 0}. As of 2024-10-13 our analysts have confirmed the fraud status to be True.

As seen above, we have retrieved relevant context from the knowledge base.


question_plus_context = json.dumps(
    {
        "prompt": f"\nHuman: This is the content you should base your query answer on. {context} \n Given the content above, give me an answer to this query: {question} \n\nAssistant: ",
        "max_tokens_to_sample": 1000,
    }
)

llm_response = json.loads(
    bedrockRuntimeClient.invoke_model(
        body=question_plus_context,
        modelId=model_id,
        accept="application/json",
        contentType="application/json",
    )
    .get("body")
    .read()
)["completion"]

print(llm_response)

Based on the information provided, the 5 customers with the most urgent fraud claims are:

Andrea Crawford – Claim of $472,300.69 confirmed as fraud on 2024-05-30
Anthony Mueller – Claim of $238,654.53 confirmed as fraud on 2024-11-13
Michael Riley – Claim of $307,596.83 confirmed as fraud on 2024-10-13
Kathryn Lewis – Claim of $125,330.80 confirmed as not fraud on 2024-03-10
James Rice – Claim of $289,112.28 confirmed as not fraud on 2024-02-13

Protegrity’s API Playground enables you to protect and unprotect data via a policy-driven posture. This means that authorized users are able to see data in the clear, while unauthorized users will see tokens. Below, you will see the resulting PII detection output and respective tokenization service. By following the guided mode on Protegrity’s API Playground, you will be able to get into the specifics of interacting with policy.


# Fill in with your details
api_playground_api_key = "CWXMzmtm7C9cdNfAt9fHw1Cntbn1D25i5m2lYPmr"
jwt = "eyJraWQiOiJVbDdnOFZKTW..."

url = "https://api.playground.www.protegrity.com/v1/ai"

headers = {
    "x-api-key": api_playground_api_key,
    "Content-Type": "application/json",
    "Authorization": jwt,
}

data = {
    "operation": "protect",
    "options": {"type": "tokenize", "tags": True},
    "data": [llm_response],
}

response = requests.post(url, headers=headers, json=data)

print(response.json())


{'results': '\n\nBased on the information provided, the 5 customers with the most urgent fraud claims are:\n\n1. [PERSON]uUGBzE ZkOvDoML - swTus[/PERSON] of $472,300.69 confirmed as fraud on 2024-05-30\n2. [PERSON]aTXHBME hVEVHDo - ZgGYo[/PERSON] of $238,654.53 confirmed as fraud on 2024-11-13\n3. [PERSON]tShBNnO SDVFA - poNyH[/PERSON] of $307,596.83 confirmed as fraud on 2024-10-13 \n4. [PERSON]TvMTdfa fRJUe - Mqbcj[/PERSON] of $125,330.80 confirmed as not fraud on 2024-03-10\n5. [PERSON]KGNbH zeSQ - EcjvR[/PERSON] of $289,112.28 confirmed as not fraud on 2024-02-13\n\nI based the urgency on the date the claim was confirmed as fraud, with the most recent confirmations being most urgent. Claims confirmed as not fraud are still included in the top 5 based on claim amount, since investigation into them was recent even though they were not considered fraudulent transactions.'}

Conclusion

In this notebook, we’ve seen how to seamlessly integrate Protegrity’s GenAI Security into an Amazon Bedrock Gen AI pipeline. By leveraging fine-grained data security policy, users are exposed to data according to a company’s data security policy. This enables speeding up experimentation and deployment of GenAI applications.