Is there any way to purge/mask data in a Log Analytics workspace with regular expressions or similar, to be able to remove sensitive data that has been sent to the workspace? Like social security numbers, that is a part of an URL?
1 Answers
- As per this Microsoft Document, Log Analytics is a flexible store, which while prescribing a schema to your data, allows you to override every field with custom values. we can Mask the data in the Log Analytics workspace and here are a few approaches where we can set a few strategies for handling personal data
Where possible, stop the collection of, obfuscate, anonymize, or otherwise adjust the data being collected to exclude it from being considered "private". This is by far the preferred approach, saving you the need to create a very costly and impactful data handling strategy. Where not possible, attempt to normalize the data to reduce the impact on the data platform and performance. For example, instead of logging an explicit User ID, create a lookup data that will correlate the username and their details to an internal ID that can then be logged elsewhere. That way, should one of your users ask you to delete their personal information, it is possible that only deleting the row in the lookup table corresponding to the user will be sufficient. Finally, if private data must be collected, build a process around the purge API path and the existing query API path to meet any obligations you may have around exporting and deleting any private data associated with a user.
- Here is the KQL query for verifying the private data in log analytics
search *
| where * matches regex @'\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}\b' //RegEx originally provided on https://stackoverflow.com/questions/5284147/validating-ipv4-addresses-with-regexp
| summarize count() by $table

- 2,218
- 1
- 5
- 15
-
I am aware that you should not log sensitive data, but in some cases, it is difficult to avoid this, such as when you eg. enables logging in Azure API Management. From what I have come to the conclusion, it is not possible to "purge" against a workspace, when you need to filter with a regular expression. – Jonas Nilsson Mar 21 '22 at 12:00
-
the cost of purge is very expensive, so many operators are not allowed. what teams normally do is use queries with regex/etc to *find* exact values to purge, and then purge those exact values with a much more specific+performant query. – John Gardner Mar 24 '22 at 16:09