2

I would like to use a regular expression that works otherwise, but it does not work in Google Sheets.

Google Sheets requires re2 expressions and I don't know how to format this properly. (I've read lots of questions here on StackOverflow already but unfortunately I am stuck.)

I would like to convert the following expression:

\w++\-?+\w*+\.\w++\/

The expression is intended to do the following. Extract the domain names from a given URL. For example:

from: https://www.tag24.de/dresden/ 
get: tag24.de/

from: https://de.nachrichten.yahoo.com/wohnen-hamburg-bezirk-f
get: yahoo.com/

from: https://www.hitradio-rtl.de/beitrag/hier-leben
get: hitradio-rtl.de/

Any advice or help would be appreciated.

1 Answers1

2

The problem is that the possessive quantifiers are not supported in RE2:

Possessive repetitions:
x*+   zero or more «x», possessive NOT SUPPORTED
x++   one or more «x», possessive NOT SUPPORTED
x?+   zero or one «x», possessive NOT SUPPORTED
x{n,m}+   «n» or ... or «m» «x», possessive NOT SUPPORTED
x{n,}+    «n» or more «x», possessive NOT SUPPORTED
x{n}+ exactly «n» «x», possessive NOT SUPPORTED

You can use

=REGEXEXTRACT(A1, "\w+(?:-\w+)?\.\w+(?:/|$)")
=REGEXEXTRACT(A1, "\w[\w-]*\.\w+(?:/|$)")

In case you want to exclude the final /, use a capturing group:

=REGEXEXTRACT(A1, "(\w+(?:-\w+)?\.\w+)(?:/|$)")
=REGEXEXTRACT(A1, "(\w[\w-]*\.\w+)(?:/|$)")

Details:

  • \w+ - one or more word chars
  • (?:-\w+)? - an optional occurrence of a - and one or more word chars
  • \. - a dot
  • \w+ - one or more word chars
  • (?:/|$) - either a / or end of string.
  • \w[\w-]* - one word char and then zero or more word or hyphen chars (that is good if you do not carehow many hyphens there can be after the first word char, or if there can be several consecutive hyphens).

When a capturing group is used in the regex, only its contents is returned by REGEXEXTRACT.

See a Google Sheets demo:

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563