Data Policies
Declarative Redaction of Sensitive Data
kPow supports configurable redaction of Data Inspection results with Data Policies.
Data policies are defined in a YAML file and configured with an environment variable:
1
DATA_POLICY_CONFIGURATION_FILE=/path/to/masking/config.yml
Copied!
Data policies are a declarative way of defining how redactions are applied to query results.
kPow supports redactions on both the key and value attributes of records and supports redaction of scalar types (eg: strings) or within structured data types (eg: maps, collections).
Structured data redaction currently supports AVRO, JSON, Transit, and EDN data formats.
String serdes are removed from Data Inspect when Data Policies are configured as they could be used to circumvent redaction.

Exclusions

Define exclusions: in your Data Policies YAML file to exclude specific topics from redaction and allow them to be inspected with String serdes.
1
exclusions:
2
topics: ["tx_meta", "tx_metrics"]
Copied!

Data Policies

The YAML configuration defines policies, each policy contains:
  • name: the unique name of the data policy
  • resources: the resources governed by the policy
  • category: the category for this policy
  • redaction: the redaction function to be applied
  • type: the type of data (either scalar or non-scalar)
  • fields: the fields to redact for non-scalar data

Example YAML

Example: A Credit Card policy that shows only the last four digits of specific fields in all topics.
1
policies:
2
- name: Credit Card
3
category: PII
4
resources:
5
- [ 'cluster', '*', 'topic', '*', 'value']
6
redaction: ShowLast4
7
type: non-scalar
8
fields: [ credit_card, creditcard, pan ]
Copied!

Resource

Resources are defined through a taxonomy that describes the hierarchy of objects in kPow:
1
[DOMAIN_TYPE, DOMAIN_ID, OBJECT_TYPE?, OBJECT_ID? OBJECT_RESOURCE?]
Copied!
Where:
  • DOMAIN_TYPE: always cluster for data policies
  • DOMAIN_ID: the ID of the cluster or * for all clusters.
  • OBJECT_TYPE: always topic for data policies
  • OBJECT_ID: the name of the topic or * for all topics.
  • OBJECT_RESOURCE: (optional) either key, headers or value
Specifying a topic, key, or value is optional.

Example Resources

Resource
Effect
["cluster", "*"]
All clusters and topics
["cluster", "N9xnGujkR32eYxHICeaHuQ"]
All topics for a specific cluster
["cluster", "*", "topic", "MyTopic"]
Specific topic on all clusters (key and value)
["cluster", "*", "topic", "MyTopic", "key"]
Specific topic on all clusters (key only)
["cluster", "*", "topic", "*", "value"]
All topics on all clusters (value only)
["cluster", "*", "topic", "MyTopic", "headers"]
Specific topic on all clusters (headers only)

Redaction Functions

Supported redaction functions include:
Redaction
Description
Example Data
Example Result
Full
Fully redact the matched value
John Smith
************
SHAHash
Apply a SHA512 hash to the value
John Smith
ed014a19bb67a..
ShowEmailHost
Show the email host
*********@corp.org
ShowEmailPart
Show first character and host
j********@corp.org
ShowFirst
Show the first character
John Smith
J*********
ShowFirst2
Show the first two characters
John Smith
Jo********
ShowFirst4
Show the first four characters
John Smith
John******
ShowFirst6
Show the first six characters
John Smith
John S****
ShowLast
Show the last character
John Smith
*********h
ShowLast2
Show the last two characters
John Smith
********th
ShowLast4
Show the last four characters
John Smith
******mith
ShowLast6
Show the last six characters
John Smith
**** Smith

Nested Redaction

kPow supports redaction of nested data structures.
Example: Applying the example Credit Card policy to a JSON message.
1
{
2
"user_details": {
3
"email_address": "[email protected]",
4
"payment_options": [
5
{ "credit_card": "376953644924215" }
6
]
7
}
8
}
Copied!
The data is masked accordingly when displayed in Data Inspect search results:
1
{
2
"user_details": {
3
"email_address": "[email protected]",
4
"payment_options": [
5
{ "credit_card": "***********4215" }
6
]
7
}
8
}
Copied!
kPow is conservative when applying data policies. Given a field where the selected redaction function cannot apply, the fallback is to use the Full redaction policy, e.g:
1
{
2
"user_details": {
3
"email_address": "[email protected]",
4
"payment_options": [
5
{
6
"credit_card": {
7
"pan": "376953644924215",
8
"expiry": "10/10/2010"
9
}
10
}
11
]
12
}
13
}
Copied!
Applying the same Credit Card policy to this data incurs a Full redaction at the credit_card field as kPow does not know how to apply the configured "ShowLast4" redactor to a structured value (in this case a map with "pan" and "expiry" fields).
The result is effectively truncated:
1
{
2
"user_details": {
3
"email_address": "[email protected]",
4
"payment_options": [
5
{ "credit_card": "***************" }
6
]
7
}
8
}
Copied!

Data Policy Sandbox

kPow comes with a built in Data Policy Sandbox to experiment with your currently configured policies or to create and test new configuration.
To access the Data Policy Sandbox navigate to Admin -> Data Policies
kPow provides a Data Policies sandbox
Last modified 1mo ago