0

I have to do some regex and am restricted to a google sheet where I don't have my sweet, sweet, python string methods.

Preferred result:

I am using =REGEXMATCH() in google sheets and want to parse URLs. For example, for the following url:

https://www.foobar.com/foo/bar

I'd like to get everything after https:// and before the first / after the domain, like so (assume all my urls start with https:// and have extensions after the base url.

www.foobar.com


What I have tried:

I have the following formula that gets me close:

=REGEXEXTRACT(A1,"^https?:\/\/[^\/]+")

https://foobar.com

But I still need to get rid of the https://

I also tried a positive look behind assertion but to no avail:

=REGEXEXTRACT(A1,"/ (?<=https?:\/\/)[^\/]/+")

but I get an error.

Any help is appreciated!

Twilight
  • 1,399
  • 2
  • 11
bismo
  • 1,257
  • 1
  • 16
  • 36
  • Related: https://stackoverflow.com/questions/10306690/what-is-a-regular-expression-which-will-match-a-valid-domain-name-without-a-subd – TheMaster Oct 05 '22 at 06:59

2 Answers2

1

Try:

=REGEXEXTRACT(A1,"https:\//(.*?)\/")

Result:

enter image description here

Using the parenthesis (), which means extract a group of characters inside. Then using the dot "." which means any characters except new line. Then you state the characters before and after the string you want to extract.

Refer to this link: REGEX Cheat Sheet

Logan
  • 1,691
  • 1
  • 4
  • 11
1

If you wanted something that wasn't using Regex (which based on your fondness of python is unlikely), you could use a more spreadsheet-like approach.

9 represents the length of https//. If you wanted to check for without the s you could include an if statement.

=mid(A1,9,find("/",mid(A1,9,len(A1)))-1)
pgSystemTester
  • 8,979
  • 2
  • 23
  • 49