0

Currently i am using the logback framework and patterns in my logs , but I need to mask the PII information for email-id. I need The regular expression which helps to Mask my email Address from the logs, my requirement was to mask only 30% of the email id. Since i am using the Logback we can use the logback and masking pattern

  <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
        <maskPattern>(?<=.{3}).(?=[^@]*?@)</maskPattern> <!--mask email-->
       </encoder>
  </appender>
For example : testing@test.com 
Expected Result:tes***@test.com

I tried Using Different Regular expressions from google and none of them works with the logback framework. For example when i use the below regular expression in logback.xml was not able to parse it or compile it. i have taken reference as : https://howtodoinjava.com/logback/masking-sensitive-data/

<maskPattern><![CDATA[/(?<=.{3}).(?=[^@]*?@)]]></maskPattern>
<maskPattern><(?<=.{3}).(?=[^@]*?@)></maskPattern>

Examples: Can any body please provide me the correct regular expression or pattern, below are the regular expression or pattern i tried none of them works.

1) (?<=.{3}).(?=.*@)
2) (?<=.{2}).(?=[^@]*?@)
3) ([^@]{4})[^@]*(.+)
4) \\b(\\w)[^@]+@\\S+(\\.[^\\s.]+)

please check the screen shot below. enter image description here

ref: https://stackoverflow.com/questions/33100298/masking-of-email-address-in-java

erama035
  • 1
  • 1
  • The RFC5322 regex up to the `@` is `(?:"[^\\"]*(?:\\.[^\\"]*)*"|(?:[a-zA-Z0-9](?:\.(?!\.)|[\w!#-'*+\-/=?\^\`{-~])*)?[a-zA-Z0-9])(?=@)` If using JS regex engine you can pare this down with a variable lookbehind I believe. Just a guess. – sln Feb 16 '23 at 23:26
  • Probably this is the best you're going to get `[\w!-'*+\--/=?\^\`{-~]{1,3}(?=@)` https://regex101.com/r/uRs6ey/1 It's just a partial email, with no validation whatsoever. But it does include only the valid characters in the class allowed up to the _@_. The matched should be replaced by * for each character. – sln Feb 16 '23 at 23:52
  • Or replace any match with `***` like here https://regex101.com/r/bvaqDU/1 – sln Feb 16 '23 at 23:58
  • @Sln ok !..i am trying to explain just the requirement. – erama035 Feb 17 '23 at 00:13
  • Found this https://howtodoinjava.com/logback/masking-sensitive-data/ with good examples. Looks like the _Pattern_ needs to be enclosed in a group within the xml. `([\w!-'*+\--/=?\^\`{-~]{1,3}(?=@))` And they talk about _handlers_ in the code to do the actual substitution given the regex in the xml config file you're trying to do. I know enough Java to handle regex but I don't use it and its not installed for testing. – sln Feb 17 '23 at 15:59
  • If this helps, the RFC5322 regex are _Multi-Line Version_ `(?im)^(?=.{1,64}@)(?:(\"[^\"\\]*(?:\\.[^\"\\]*)*\"@)|((?:[0-9a-z](?:\.(?!\.)|[-!#\$%&'\*\+/=\?\^\`\{\}\|~\w])*)?[0-9a-z]@))(?=.{1,255}$)(?:(\[(?:\d{1,3}\.){3}\d{1,3}\])|((?:(?=.{1,63}\.)[0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9])|((?=.{1,63}$)[0-9a-z][-\w]*))$` https://regex101.com/r/6Gq1be/1 – sln Feb 17 '23 at 16:44
  • And _Whitespace Boundary Version_ `(?i)(?<!\S)(?=.{1,64}@)(?:(\"[^\"\\]*(?:\\.[^\"\\]*)*\"@)|((?:[0-9a-z](?:\.(?!\.)|[-!#\$%&'\*\+/=\?\^\`\{\}\|~\w])*)?[0-9a-z]@))(?=\S{1,255}(?!\S))(?:(\[(?:\d{1,3}\.){3}\d{1,3}\])|((?:(?=.{1,63}\.)[0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9])|((?=\S{1,63}(?!\S))[0-9a-z][-\w]*))(?!\S)` https://regex101.com/r/HS4GXg/1 – sln Feb 17 '23 at 16:44
  • 1
    ([\w!-'*+\--/=?\^`-~]{1,3}(?=@)) --> this regular expression works but the problem was it was not satisfying to maks 30% PII. because Request payload : { "email":"eradddddddddddddddddddddma***@gmail.com" } but we need like Request payload : { "email":"eraddddddddd************@gmail.com" } – erama035 Feb 17 '23 at 17:34
  • There is a max of 64 characters in the local part of email addreses. Given that, you can make 20 lookbehind assertions that cover all 30% positions. Like this: https://regex101.com/r/0alNs2/1 expanded and https://regex101.com/r/DSfEmd/1 compressed (add capture group around the regex). But you are using the stock handlers that take the length of the match (group 1) to create a string of asterisks for replace. However what I did in the regex could be done in the handler by capturing the entire local part `([!-'*+\--9=?A-Z\^-~]{1,64}(?=@))` and getting 30% of the length and overwrite the string . – sln Feb 18 '23 at 15:59
  • I prefer the 30% interval be stepped one more on the current match. This guarantees a 30% minimum. These regex are https://regex101.com/r/d1uwZZ/1 expanded. https://regex101.com/r/pGoT5U/1 compressed. Don't forget to wrap the regex into a capture group in the xml file. – sln Feb 18 '23 at 16:41
  • i am not able to wrap it in the xml. Is there a way that we can achieve using the words like : (\w+@\w+\.\w+) – erama035 Feb 22 '23 at 22:43
  • [\w!-'*+\--/=?\^`{-~]{1,3}(?=@) is also not working when i wrap it from XML. – erama035 Feb 22 '23 at 22:53
  • why don't you write your own encoder? it would provide you flexibility to control whatever you need – ursa Mar 16 '23 at 10:12

0 Answers0