I'm working with Node.js and GCP Data Loss Prevention to attempt to redact sensitive data from PDFs before I display them. GCP has great documentation on this here
Essentially you pull in the nodejs library and run this
const fileBytes = Buffer.from(fs.readFileSync(filepath)).toString('base64');
// Construct image redaction request
const request = {
parent: `projects/${projectId}/locations/global`,
byteItem: {
type: fileTypeConstant,
data: fileBytes,
},
inspectConfig: {
minLikelihood: minLikelihood,
infoTypes: infoTypes,
},
imageRedactionConfigs: imageRedactionConfigs,
};
// Run image redaction request
const [response] = await dlp.redactImage(request);
const image = response.redactedImage;
So normally, I'd get the file as a buffer, then pass it to the DLP function like the above. But, I'm no longer getting our files as buffers. Since many files are very large, we now get them from FilesStorage as streams, like so
return FilesStorage.getFileStream(metaFileInfo1, metaFileInfo2, metaFileInfo3, fileId)
.then(stream => {
return {fileInfo, stream};
})
The question is, is it possible to perform DLP image redaction on a stream instead of a buffer? If so, how?
I've found some other questions that say you can stream with ByteContentItem
and GCPs own documentation mentions "streams". But, I've tried passing the returned stream from .getFileStream
into the above byteItem['data'] property, and it doesn't work.