0

I'm trying to build a regular expression that can work on Google Sheets and DataStudio. The regex must check that a URL is correctly built from start to end, with no trailing characters except whitespace.

Below a test version that works for my case (here it's application)

^(http[s]?:\/\/)([^:\/\s]+)(\/\?)((?:(?:\w+=\w+)&?)+)$

Problem is that some urls in my database contain a leading or trailing zero-width space (200B code). This invalidates the query at times it shouldn't. I don't have a way to clean the data before I read it, and I can't find a valid code within this regex dialect to include it in the match.

Is there a possible workaround to this?

ABO
  • 170
  • 1
  • 1
  • 10
  • you want to validate or extract? – player0 Nov 18 '21 at 12:00
  • I probably won't need extracting, it's only present in the sheet to test some pieces. – ABO Nov 18 '21 at 15:00
  • Does this answer your question? [How do I remove trailing whitespace using a regular expression?](https://stackoverflow.com/questions/9532340/how-do-i-remove-trailing-whitespace-using-a-regular-expression) Modify the last section `$` with `[ \t]+$` – Kessy Nov 18 '21 at 15:06

1 Answers1

1

You can always just remove it before the REGEXMATCH is applied, e.g.:

=REGEXMATCH(SUBSTITUTE(E7;CHAR(8203);"");$D$2)

or

=REGEXMATCH(REGEXREPLACE(E7;CHAR(8203);"");$D$2)

You could also "sandwich" the REGEX expression you have between instances of [^!-~]* (which would essentially allow anything outside of the printable range of English characters):

^[^!-~]*(http[s]?:\/\/)([^:\/\s]+)(\/\?)((?:(?:\w+=\w+)&?)+)[^!-~]*$

Erik Tyler
  • 9,079
  • 2
  • 7
  • 11