As part our desire to avoid unnecessary Personally Identifiable Information collection, I would like to prevent email addresses from being logged and later extend this to other info.
As we want to treat logs as streams, the obvious solution seems to be using sed. However I can't find any information on whether or not this is a good idea.
Presumably I can just pipe the service output through something like
#!/bin/sed -rf
# Obfuscate email addresses (e.g username@email.com => ####@email.com).
s/[[:alnum:]_+%.-]+@/###@/
And not have to worry about email addresses finding their way into the logs.
- Would this have some obvious negative consequence that I am missing? (Logs are new-line delimeted JSON, Average length 800 chars).
- Is there some standard way I could configure this at the kubernetes node/cluster level?
(I understand that this email regex is not comprehensive, there would still be the assumption that the logs need to be treated securely. I'm looking more for a belt and braces approach)