Extract url domain root in Google Sheet

Question

In a table, I have lists of full urls like :

https://www.example.com/page-1/product-x?utm-source=google

Objective : I want to extract the domain name part of the url only :

https://www.example.com/

I was using this following formula :

=REGEXEXTRACT(A1;"^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n?]+)")

The regex is working fine when testing it :

https://www.example.com/

However in Google sheet, It displays like :

example.com

Why the resutls are not the same for the identical regex ?
How to correct it in google Sheet ?

score 4 · Accepted Answer · answered Sep 28 '20 at 08:49

You can fix the pattern by removing the capturing group (i.e. here, ([^:\/\n?]+) => [^:\/\n?]+) or by converting the capturing groups to non-capturing ones (i.e. ([^:\/\n?]+) => (?:[^:\/\n?]+)):

=REGEXEXTRACT(A1;"^(?:https?://)?(?:[^@\n]+@)?(?:www\.)?[^:/\n?]+")
=REGEXEXTRACT(A1;"^(?:https?://)?(?:[^@\n]+@)?(?:www\.)?(?:[^:/\n?]+)")

NOTE:

If the regex contains capturing group(s), the REGEXEXTRACT returns captured value(s)
If there are no capturing groups in the regex, the function returns the whole match value only.

Note you do not need to escape / forward slashes in RE2 regexps since they are defined with the help of string literals in Google Sheets.

The pattern may be reduced to ^(?:https?://)?[^:/\n?]+, that matches http:// or https:// optionally, and then matches one or more chars other than /, newline, or ?.

See this RE2 regex demo.

Extract url domain root in Google Sheet

1 Answers1

Linked