How to skip a certain group of characters (https://) in regexmatch google sheets

Question

I have to do some regex and am restricted to a google sheet where I don't have my sweet, sweet, python string methods.

Preferred result:

I am using =REGEXMATCH() in google sheets and want to parse URLs. For example, for the following url:

https://www.foobar.com/foo/bar

I'd like to get everything after https:// and before the first / after the domain, like so (assume all my urls start with https:// and have extensions after the base url.

www.foobar.com

What I have tried:

I have the following formula that gets me close:

=REGEXEXTRACT(A1,"^https?:\/\/[^\/]+")

https://foobar.com

But I still need to get rid of the https://

I also tried a positive look behind assertion but to no avail:

=REGEXEXTRACT(A1,"/ (?<=https?:\/\/)[^\/]/+")

but I get an error.

Any help is appreciated!

Related: https://stackoverflow.com/questions/10306690/what-is-a-regular-expression-which-will-match-a-valid-domain-name-without-a-subd — TheMaster, Oct 05 '22 at 06:59

score 1 · Accepted Answer · answered Oct 05 '22 at 01:45

Try:

=REGEXEXTRACT(A1,"https:\//(.*?)\/")

Result:

Using the parenthesis (), which means extract a group of characters inside. Then using the dot "." which means any characters except new line. Then you state the characters before and after the string you want to extract.

Refer to this link: REGEX Cheat Sheet

score 1 · Answer 2 · answered Oct 05 '22 at 02:11

If you wanted something that wasn't using Regex (which based on your fondness of python is unlikely), you could use a more spreadsheet-like approach.

9 represents the length of https//. If you wanted to check for without the s you could include an if statement.

=mid(A1,9,find("/",mid(A1,9,len(A1)))-1)

How to skip a certain group of characters (https://) in regexmatch google sheets

2 Answers2