0

I would like to ask a question about how to sanitize information when logging with Java.

Some example, we have class Person with a very sensitive SSN, and a class Account with a very sensitive credit card number (and hundreds of other business classes with sensitive information over time).

public class Person {

    private String firstName;
    private String lastName;
    private String socialSecurityNumber;

    public String getFirstName() {
        return firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public String getSocialSecurityNumber() {
        return socialSecurityNumber;
    }

    public String getSanitizeSocialSecurityNumberForLog() {
        //apply some masking logic
        return sanitizeSocialSecurityNumberForLog;
    }

}
public class Account {

    private String firstName;
    private String lastName;
    private String creditCardNumber;

    public String getFirstName() {
        return firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public String getCreditCardNumber() {
        return creditCardNumber;
    }

    public String getSanitizeCreditCardNumberForLog() {
        // last four digits
        return lastFour;
    }

}

And in our business logic, we will have something like:

        LOGGER.info("The person's first name is " + person.getFirstName() + " and his SSN is " + person.getSocialSecurityNumber());

or worst:

LOGGER.info("The person's is " + person.toString());

This created many incidents. And we ended up doing something horrible, not scalable, which is to create for every sensitive "thing" we have, some kind of getSanitizedThing(), and then to use:

LOGGER.info("The person's first name is " + person.getFirstName() + " and his SSN is " + person.getSanitizeSocialSecurityNumberForLog());

This solution is not working at all. We have boiler plate code in our POJOs, in the service layers etc to perform sanitization.

My question is not about log level tuning btw.

Question: What would be the best pattern in order to address this issue in the most clean and effective way please?

Thank you!

PatPanda
  • 3,644
  • 9
  • 58
  • 154
  • I don't think there's a good "pattern" here. Part of the Agile Manifesto says, basically, to hire smart people instead of relying on procedures or systems. The person who added that logging info was not on the ball. Just remove it. Replace it with something else like its identity hashcode if you must log something (but you probably don't). ALL the information in that class you presented is sensitive, you can't log any of it. – markspace Oct 10 '20 at 23:45
  • 1
    Thank you for your comment @markspace. Maybe besides pattern, there is a real engineering solution, design patter, reflection, framework, log4j setup, something else, that can help resolve this technical issue? Something that can be taught and shared to improve engineering, besides writing in a contract "if you log something sensitive, you are dumb and we made a mistake hiring you. you are fired." Hopefully, it can also help the community here – PatPanda Oct 11 '20 at 01:25
  • 1
    I agree with you @PatPatPat that it may not be the best option, as in my opinion domain entities shouldn't care about being logged at all, so these methods look a bit weird there. Depending on your logger implementation you may have nicer options. For example, if you use [Logback you have this post](https://stackoverflow.com/questions/25277930/mask-sensitive-data-in-logs-with-logback), in which your Pattern instance could be the credit card / social security number. – Adam Oct 11 '20 at 12:14
  • Give a look at the RewriteAppender, which should help you pre-processing the log message before it is actually delivered to a final appender: https://logging.apache.org/log4j/2.x/manual/appenders.html#RewriteAppender – Little Santi Aug 07 '21 at 23:09

0 Answers0