0

I have been trying to figure out a regex for this problem for quite some time, but it has not worked out, so I am reaching out for some help. I have created a regex that will capture a particular string of numbers when they appear in an email. The problem is that it also captures this particular string of numbers when they are inside a URL. The URLs are randomly generated, and a great deal of the time, they contain a string of numbers that matches my regex. I've been trying to create a regex that will still capture a particular string of numbers but will ignore that string when it is inside a URL but with no luck. Here is an example of the regex I have been using.

    (?:700[0-9][0-9]{7}|81[0-9][0-9][0-9]{5}|9999[0-9]{8})\b

and here is an example of an email that contains that certain string.

https://test.test.test.outlook.com/?url=bunchofrandomstuffthatdoesnotmatterF&data=sfsfsdagfd4454366474retre45435700000000%7CRegex%randomthingsoiMC4wLjAwnotareallink2luMzIiLCJBTiIjfsdkljafdslflsdkajfljie

mailto: From: Sent: Monday, May 17, 2021 11:42 AM To: 700000000 . If received" Detected: External recipients,

https://test.test.test.outlook.com/?url=bunchofrandomstuffthatdoesnotmatterF&data=sfsfsdagfd4454366474retre45435700000000%7CRegex%randomthingsoiMC4wLjAwnotareallink2luMzIiLCJBTiIjfsdkljafdslflsdkajfljie

The problem is that it is capturing the number in text that makes up the URLs and the number in the mailto line. If possible, I need a regex that captures the string of numbers that meet the criteria of the regex anywhere in the email except for when it is inside of a URL.

I have tried the following

(?:700[0-9][0-9]{7}|81[0-9][0-9][0-9]{5}|9999[0-9]{8})\b(?:(?!https://test.test.test.outlook.com).)

It does not work either. Any ideas?

John smith
  • 15
  • 5
  • 1
    Try it like this `https?://\S*(?:700[0-9][0-9]{7}|81[0-9][0-9][0-9]{5}|9999[0-9]{8})\b(*SKIP)(*F)|(?:700[0-9][0-9]{7}|81[0-9][0-9][0-9]{5}|9999[0-9]{8})\b` https://regex101.com/r/rv29Tt/1 – The fourth bird May 18 '21 at 20:59
  • Thank you very much. It is much better than what I've been trying to do. The only thing is that when I try to copy that into a new regex101 it complains about the / saying an unescaped delimiter must be escaped with a backslash. I can't figure don't know why as I have tried my best to make all of the settings the same. – John smith May 18 '21 at 21:45
  • Did it work out in the code? On regex101 at the left top you can change the delimiter to an other one than `/` You can also escape the forward slash like https://regex101.com/r/VzCe9b/1 – The fourth bird May 18 '21 at 21:46
  • 1
    Yes, the code worked and I would like to consider this question answered. Thank you for the help. How do you get credit for answering it. – John smith May 18 '21 at 21:54

1 Answers1

0

Boost supports Perl Regular Expression Syntax, from which you can leverage backtracking control verbs (*SKIP)(*FAIL)

As the url is randomly generated, instead of excluding https://test.test.test.outlook.com you can match http:// or https:// followed by 0 or more non whitespace chars using \S* to rule out matching an url.

Then you can use the same pattern after the alternation |

Note that for the current example data, the pattern in the question does not match. I have added 2 zeroes to 700000000 to get a match in the example data.

\bhttps?://\S*(?:700[0-9][0-9]{7}|81[0-9][0-9][0-9]{5}|9999[0-9]{8})\b(*SKIP)(*F)|(?:700[0-9][0-9]{7}|81[0-9][0-9][0-9]{5}|9999[0-9]{8})\b

See a regex demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Sorry to bring this up again but are there any alternatives to (*SKIP)(*FAIL) the reason I ask is that Microsoft does not validate those verbs despite them saying that they use boost for regex. – John smith May 20 '21 at 13:16
  • In that case you can match the first part and instead of using skip fail you can use a capture group. – The fourth bird May 20 '21 at 13:31
  • 1
    I know this is old, I just wanted to say that code with the (*SKIP)(*FAIL) was not working for me at the time but a few months later it started to work with Microsoft. Thanks once again for the help. – John smith May 02 '22 at 19:07