25

I am having a little bit of regex trouble.

I am trying to get the path in this url videoplay.

http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello

If I use this regex /.+ it matches /video as well.

I would need some kind of anti / negative match to not include //

ThomasReggi
  • 55,053
  • 85
  • 237
  • 424
  • 1
    When I have to use regexes on urls fast and dirty, I usually include // at the beginning, before the capture group. Note you can't do http://, because they might be accessing it using a different protocol, or even ://, because they might specify the port number. – jwrush Aug 19 '12 at 01:06
  • possible duplicate of [Getting parts of a URL (Regex)](http://stackoverflow.com/questions/27745/getting-parts-of-a-url-regex) – Raniz Jun 04 '15 at 02:07

13 Answers13

45

In case if you need this for your JavaScript web-app: the best answer I ever found on this topic is here. Basic (and also original) version of the code looks like this:

var parser = document.createElement('a');
parser.href = "http://example.com:3000/pathname/?search=test#hash";

parser.protocol; // => "http:"
parser.hostname; // => "example.com"
parser.port;     // => "3000"
parser.pathname; // => "/pathname/"
parser.search;   // => "?search=test"
parser.hash;     // => "#hash"
parser.host;     // => "example.com:3000"

Thank you John Long, you made by day!

Vlad Mysla
  • 1,181
  • 12
  • 15
16

(http[s]?:\/\/)?([^\/\s]+\/)(.*) group 3
Demo: http://regex101.com/r/vK4rV7/1

M G
  • 1,240
  • 14
  • 26
  • 2
    It wouldn't work if there for a path such as `www.abc.com?param=xyz`. I slightly modified it like this to make it work (I also use non-matching group for the first two groups). `(?:https?:\/\/)?(?:[^?\/\s]+[?\/])(.*)` Demo: https://regex101.com/r/eNUBb9 – nbeuchat May 24 '18 at 16:52
10

This expression gets everything after videoplay, aka the url path.

/\/(videoplay.+)/

This expression gets everything after the port. Also consisting of the path.

/\:\d./(.+)/

However If using Node.js I recommend the native url module.

var url = require('url')
var youtubeUrl = "http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello"
url.parse(youtubeUrl)

Which does all of the regex work for you.

{
  protocol: 'http:',
  slashes: true,
  auth: null,
  host: 'video.google.co.uk:80',
  port: '80',
  hostname: 'video.google.co.uk',
  hash: '#hello',
  search: '?docid=-7246927612831078230&hl=en',
  query: 'docid=-7246927612831078230&hl=en',
  pathname: '/videoplay',
  path: '/videoplay?docid=-7246927612831078230&hl=en',
  href: 'http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello' 
}
ThomasReggi
  • 55,053
  • 85
  • 237
  • 424
  • The `url` node module is in legacy mode. The docs recommend using the `URL` class instead. See here: https://nodejs.org/dist/latest-v14.x/docs/api/url.html#url_legacy_url_api – darksinge Jul 28 '21 at 16:29
5

function getPath(url, defaults){
    var reUrlPath = /(?:\w+:)?\/\/[^\/]+([^?#]+)/;
    var urlParts = url.match(reUrlPath) || [url, defaults];
    return urlParts.pop();
}
alert( getPath('http://stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('https://stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('//stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url?foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url#foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/?foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/#foo', 'unknown') );
alert( getPath('http://stackoverflow.com/', 'unknown') );
gazdagergo
  • 6,187
  • 1
  • 31
  • 45
Vlad Mysla
  • 1,181
  • 12
  • 15
4

You can try this:

^(?:[^/]*(?:/(?:/[^/]*/?)?)?([^?]+)(?:\??.+)?)$

([^?]+) above is the capturing group which returns your path.

Please note that this is not an all-URL regex. It just solves your problem of matching all the text between the first "/" occurring after "//" and the following "?" character.

If you need an all-matching regex, you can check this StackOverflow link where they have discussed and dissected all possibilities of an URI into its constituent parts including your "path".
If you consider that an overkill AND if you know that your input URL will always follow a pattern of having your path between the first "/" and following "?", then the above regex should be sufficient.

Community
  • 1
  • 1
Kash
  • 8,799
  • 4
  • 29
  • 48
  • Try this url: http://video.google.co.uk:80?docid=-7246927612831078230&hl=en#hello, this regex returns group1 = o – FiftiN Apr 04 '19 at 12:13
3

for new Googlers, use JavaScript web api URL at any environment:

new URL('your url string').pathname

https://developer.mozilla.org/en-US/docs/Web/API/URL/URL

2

Even though the answers using language features are good, here is one more way to split URL to components using REGEXP:

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?
     ||            |  |          |       |   |        | |
     12 - scheme   |  |          |       |   |        | |
                   3  4 - authority, includes hostname/ip and port number.
                                 5 - path|   |        | |
                                         6   7 - query| |
                                                      8 9 - fragment
Nolequen
  • 3,032
  • 6
  • 36
  • 55
2

I have worked on it extensively and here is the result:

(?i)(?<scheme>http|https|ftp|sftp|sip|sips|file):\/\/(?:(?<username>[^`!@#$^&*()+=,:;'"{}\|\[\]\s\/\\]+)(?::(?<password>[^`!@#$^&*()+=,:;'"{}\|\[\]\s\/\\]+))?@)?(?:(?<ipv4>((?:(?:25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|1?\d\d?)))|\[(?<ipv6>(?i)(?:[\da-f]{0,4}:){1,7}(?:(?<ipv4_in_ipv6>(?:(?:25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|1?\d\d?))|[\da-f]{0,4}))\]|(?:(?<sub_domain>[^\s~`!@#$%^&*()_+=,.?:;'"{}\|\[\]\/\\]+\.)*(?<domain>[^\s~`!@#$%^&*()_+=,.?:;'"{}\|\[\]\/\\]+)(?<tld>\.[^\s~`!@#$%^&*()\-_+=,.?:;'"{}\|\[\]\/\\0-9]{2,})))+(?<port>:\d+)?(?:\/(?<path>\/?[^\s`@#$^&=.?"{}\\]+\/)*(?<file>[^\s`@#$^&=?"{}\/\\]+)?(?<query>\?[^\s`#$^"{}\\]+)*(?<fragment>#[^\s`$^&=?"{}\/\\]+)?)?

Demo | Git Repository

So, in your case, there is just a need to get the group contains the path and add the word you like, i.e. videoplay. To be more specific, I am talking about this:

(?:\/videoplay(?<path>\/?[^\s`@#$^&=.?"{}\\]+\/)*(?<file>[^\s`@#$^&=?"{}\/\\]+)?(?<query>\?[^\s`#$^"{}\\]+)*(?<fragment>#[^\s`$^&=?"{}\/\\]+)?)?
Alin
  • 350
  • 2
  • 13
1

You mean a negative lookbehind? (?<!/)

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
1

var subject =
'<link rel="shortcut icon" href="https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=ec617d715196"><link rel="apple-touch-icon" href="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a"><link rel="image_src" href="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a">';
var re=/\"[a-z]+:\/\/[^ ]+"/m;
document.write(subject.match(re));

You can try this

/\"[a-z]+:\/\/[^ ]+/

Usage

if (/\"[a-z]+:\/\/[^ ]+/m.test(subject)) {  // Successful match } else {    // Match attempt failed }
Peter
  • 1,124
  • 14
  • 17
0

Its not a regex solution, but most languages have a URL library that will parse any URL into its constituent parts. This may be a better solution for what you are doing.

Toby Allen
  • 10,997
  • 11
  • 73
  • 124
-1

Please try this:

^http[s]?:\/\/(www\.)?(.*)?\/?(.)*
-2

I think this is what you're after: [^/]+$

Demo: http://regex101.com/r/rG8gB9

Firas Dib
  • 2,743
  • 19
  • 38
  • 4
    This doesn't match the path of a URL, just the very last part of the path. With "http://google.com/foo/bar" it matches "bar" – justderb May 30 '14 at 19:24