0

Google Cloud Platform lets you create label logs using the RE2 regex engine.

How can I create a regex that matches the path in the URL?

Examples matches:

https://example.com/awesome                  --> "awesome"
https://example.com/awesome/path             --> "awesome/path"
https://example.com/awesome/path/            --> "awesome/path"
https://example.com/awesome/path?arg1=123    --> "awesome/path"

Details:

  • The domain and protocol are constant, it can be assumed to be https://example.com here.
  • If there are multiple directories, they should be matched too, including the / in between.
  • Trailing / should NOT be matched.
  • Queries, e.g. ?arg1=123&arg2=456 should NOT be matched.
  • It can be assumed that directory names will only contain alphanumeric characters a-zA-Z0-9, dashes - and underscores _.

Note that Google RE2 is different than PCRE2.

osolmaz
  • 1,873
  • 2
  • 24
  • 41

1 Answers1

1

So the syntax isn't 100% clear what is supported and what isn't. Assuming (NOT SUPPORTED) VIM means it is supported but not on vim, I'd start with a negative look behind for the beginning of the url that you don't care about

(?<=https:\/\/example\.com\/)

Then you want alphanumeric characters [\w\-]+ followed by non trailing / so I'd add a lookahead to verify that there are alphanumeric characters after the / with (?=\/\w+)\/

The complete regex

(?<=https:\/\/example\.com\/)([\w\-]+((?=\/\w+)\/|\b))+
depperm
  • 10,606
  • 4
  • 43
  • 67
  • I tested this here and it works: https://regex101.com/r/pL7anj/1 But Google Log metric creation interface still complains that `Must contain exactly one regex group ()`. Is it even possible to accomplish this with a single group? – osolmaz Jan 31 '23 at 16:27
  • @osolmaz do you need regex? I feel like some string manipulation would be an alternative – depperm Jan 31 '23 at 16:39
  • RE2 regexes with only 1 regex groups are the only option offered in log metric labels. See https://cloud.google.com/logging/docs/logs-based-metrics/labels section that contains "Regular Expression:" Do you know if there is any other way to extract the path? – osolmaz Jan 31 '23 at 16:43