1

I'm trying to write a regex that matches a url only if after '/' there's a dot.

Here's what i've got so far: http://regexr.com/3cu85

my regex is the following: /facebook.com\/.*[.]/gm and i'm testing with this URls:

facebook.com
facebook.com/
facebook.com/test.user 

www.facebook.com
www.facebook.com/
www.facebook.com/test.user

https://www.facebook.com
https://www.facebook.com/
https://www.facebook.com/test.user

The problem is that I need to match the full url, and as you can it starts from the word "facebook".

I tried different options, but none worked for me.

Thanks for any help

Nick
  • 13,493
  • 8
  • 51
  • 98

1 Answers1

1

My suggestion is

(https?:\/\/)?(w{3}\.)?facebook\.com\/[^\/]*\..*

See the regex demo (the \n is added to the negated character class [^\/] so as to match the URLs on separate lines only, if you test individual strings, the \n is not necessary.)

This regex matches:

  • (https?:\/\/)? - optional (one or zero) occurrence of http:// or https://
  • (w{3}\.)? - optional (one or zero) occurrence of www
  • facebook\.com - literal sequence facebook.com
  • \/ - a literal /
  • [^\/]* - zero or more characters other than / (BETTER: use [^\/.]* to match any char but a . and / to avoid redundant backtracking)
  • \. - a literal .
  • .* - any 0+ characters but a newline (BETTER: since the URL cannot have a space (usually), you can replace it with \S* matching zero or more non-whitespace characters).

So, a better alternative:

(https?:\/\/)?(w{3}\.)?\bfacebook\.com\/[^\/.]*\.\S*
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Another quick one, is it possible to stop it to the first slash? so that if the url is facebook.com/test.user/bla it doesn't match bla? – Nick Mar 03 '16 at 11:16
  • Replace the `\S*` / `.*` at the end with `[^\/]*` – Wiktor Stribiżew Mar 03 '16 at 11:20
  • sorry what do i have to replace? only \S*? if so, it is not working properly – Nick Mar 03 '16 at 11:31
  • 1
    Do you mean to avoid matching the whole URL then? Use [`(https?:\/\/)?(w{3}\.)?\bfacebook\.com\/[^\/.\n]*\.(?![^\/]*\/)\S*`](https://regex101.com/r/oI7mB8/2). I am not sure of your current input, so I am suggesting a lookahead based approach. You could use anchors: [`/^(https?:\/\/)?(w{3}\.)?\bfacebook\.com\/[^\/.]*\.[^\/]*$/gm`](https://regex101.com/r/oI7mB8/3) if you test the strings individually. – Wiktor Stribiżew Mar 03 '16 at 11:42
  • Awesome! that's exactly what i needed :) – Nick Mar 03 '16 at 11:43