I am creating a project which connects to multiple thirdparty APIs. So, as an audit, I track all the requests and responses sent to and from these APIs. These requests and responses are of the form XML. And these XML requests and responses contain some sensitive information that I need to mask such as PII and Credit Card Numbers.
These are sample tags that are available in the XML
<myTag>someSensitiveInformation</myTag>
<myTag sensitiveInfo = foo, sensitiveTwo = bar>SomeOtherSensitiveInfo</myTag>
<myTag sensitiveInfo = foo, sensitiveTwo = bar>
I could mask them with the following regex
(<myTag)([\s\S]*?)(\/>)|(<myTag)([\s\S]*?)(>)([\s\S]+?)(<\/myTag>)
And the masked tags in all the above cases would look like this,
<myTag>*************</myTag>
This worked fine. But when the traffic is high, this regex evaluation makes CPU spikes and sometime the entire project freezes. Some of these XML requests and responses are around size 100kb. I do have multiple requests and responses corresponding to a single user operation which all of them should be masked from the above regex and it does when there is low traffic to my project.
Is there an optimized way to do this. And yes, I am aware that regex is not recommended to XML tag identification, but this seems to be the easiest approach. Any external libraries that do this kind of masking without the cost of performance, I prefer not to use log4j
masking because it seem to accumulate the logs inside the JVM. Or what would be the appropriate solution in java for this kind of scenarios.
Thanks in advance.