4

I'm utterly confused by reading so many regular expression articles so far on this.

I am looking to match the first URL, the rest should not match:

https://subdomain.example.com/test <== only this should match
https://subdomain.example.com/paht/test.css
https://subdomain.example.com/path/path/test.js
https://example.com/test/

I am looking to match only the routes that have no trailing slashes or file extensions.

Here is my regex: https:.*^(?!([^\/]|(\.[a-z]{2,8})))$

You can try here: https://regexr.com/5dic8

2 Answers2

7

Use

^https?:\/\/(?:.*\/)?[^\/.]+$

See proof

Explanation

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  http                     'http'
--------------------------------------------------------------------------------
  s?                       's' (optional (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  \/                       '/'
--------------------------------------------------------------------------------
  \/                       '/'
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    \/                       '/'
--------------------------------------------------------------------------------
  )?                       end of grouping
--------------------------------------------------------------------------------
  [^\/.]+                  any character except: '\/', '.' (1 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
0

If you are sure you're only matching urls you also can reverse the url and use:

^\w+\/
  • ^ Only at the beginning (which in this case is the end)
  • \w+ Any set of alphanumeric chars where there is at least one
  • \/ To match the slash

Would be something like this in python:

re.search(r'^\w+\/', url[::-1])

If this is not None then url ends like: .../someword.

NOTE: this is only if you are sure that url is a url indeed.

Jorge Morgado
  • 1,148
  • 7
  • 23