I am currently working on a logging system in sql server 2014, and need to find a way to assign similar strings into groups. Each string contains a prefix, an ID or Guid and then a suffix. I'd like to find a way find if there already exists strings with the same pre- and suffix in my table.
A typical string might look like:
'ApsisQueue.Synchronize @ApiKey = AA24FA75-CB78-451E-A11F-39115ABF8AA1, @CallbackId = 1'
Or
'ApsisQueue.Synchronize @ApiKey = AA24FA75-CB78-451E-A11F-39115ABF8AA1, LogMessage = NULL'
Or
'ApsisQueue.Synchronize @ApiKey = BC2CFC45-CB78-411E-A66F-3CC654AB125, LogMessage = NULL'
Or
'File id19957 does not exist'
Or
'File id1955 does not exist'
I'd like to group the messages depending on what appears before and after the Id/Guid values, so the messages above would be grouped into 3 groups.
I've considered using Levenshtein to find similar strings, but it seems a bit overkill since I know all differences will follow one another (it also causes performance issues due to the amount of strings that needs to be compared). The problem is that I don't know where in string the Guid/Id starts or ends.