0

Does Google Data Loss Prevention API support .pdf or .docx? I am trying to do reduction on *.pdf file in Java to hide sensitive data.

many thanks! Emi

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
Emi
  • 1
  • 1

2 Answers2

0

Currently, the Google Data Loss Prevention API only supports a string of text.

Sample Input:

 {
    "items":
    [
      {
        "value": "My phone number is (123) 456-7890",
        "type": "text/plain"
      }
    ],
    "replaceConfigs":
    [
      {
        "replaceWith": "[REDACTED PHONE NUMBER]",
        "infoType":
        {
          "name": "PHONE_NUMBER"
        }
      }
    ]
  }

URL: POST https://dlp.googleapis.com/v2beta1/content:redact

Sample Output:

 {
   "items": [
    {
     "type": "text/plain",
     "value": "My phone number is [REDACTED PHONE NUMBER]"
    }
   ]
  }
0

The methods for streamed in content support images, text, and binary data. You can stream your pdf through ByteContentItem https://cloud.google.com/dlp/docs/reference/rpc/google.privacy.dlp.v2#contentitem or you can convert your PDF to images and scan them as images.

If scanning content in GCS, some PII is detectable from PDFs, but you should test your use cases out.

Jordanna Chord
  • 950
  • 5
  • 12