regex expression for extracting url

Question

I have a url: http://example.com/(S(4txk2wasxh3u0slptzi20qyj))/CWC_Link.aspx

but I only want to extract this portion: (S(4txk2anwasxh3u0slptzi20qyj))/

Please, can anyone suggest me regex for this

**This might not be a job for regexes, but for existing tools in your language of choice.** Regexes are not a magic wand you wave at every problem that happens to involve strings. You probably want to use existing code that has already been written, tested, and debugged. In PHP, use the [`parse_url`](http://php.net/manual/en/function.parse-url.php) function. Perl: [`URI` module](http://search.cpan.org/dist/URI/). Ruby: [`URI` module](http://www.ruby-doc.org/stdlib-1.9.3/libdoc/uri/rdoc/URI.html). .NET: ['Uri' class](http://msdn.microsoft.com/en-us/library/txt7706a.aspx) — Andy Lester, Jul 12 '13 at 04:27

score 1 · Accepted Answer · answered Jul 10 '13 at 07:33

1

The key point is to notice that the () characters mark the boundaries and that no / character is in the contents:

/(\(S\([^/()]+\)\))/

answered Jul 10 '13 at 07:33

quetzalcoatl

32,194
8
68
107

There's a small mistake here, your first "/" in the regex results in `/(S(4txk2anwasxh3u0slptzi20qyj))/`. – Simon Jul 10 '13 at 07:59
@Simon: there is no mistake. Those are the bounding characters. Note that there's a capturing group in that expression. After applying the regex you should read the $1 match (first capture, `(S(4txk2anwasxh3u0slptzi20qyj))`), not $0 (whole match `/(S(4txk2anwasxh3u0slptzi20qyj))/`). Without that bounding characters, if you pass an url of `http://farmer.gov.in/asdada(S(foo))asdasd/(S(key))/asdasdasd` you might catch the 'foo' instead of the 'key'. But of course that's anyways so improbable that you can probably safely remove the extra bounding '/'s. – quetzalcoatl Jul 10 '13 at 08:15
1

Even if the answer is already ticked as solution, there's still a mistake @quetzalcoatl because then your capturing group does not include the forward slash at the end of the string which was described by the OP as desired result. Also you won't catch the foo if you restrict your regex to match for a forward slash after the bracket. – Simon Jul 10 '13 at 08:33
`Also you won't catch the foo if you restrict your regex to match for a forward slash ` - this is exactly why I included a '/' at both sides. Compare that to your regex that is capable of capturing **many more** false positives. As to the tail, I've completely intentionally left the trailing '/' off the regex, because I take it as a typo on the OP side, because he clearly wanted to catch the 'magic string' from the URL. He didn't complain mind you. – quetzalcoatl Jul 10 '13 at 09:06

Trogvar · Answer 2 · 2013-07-10T07:49:53.437

Here's your regex. The part in braces will extract needed fragment

/^.+\/([^\/]+)\/.+$/

Basically, the logic is simple: ^ - marks beginning of the string

.+\/ - matches all symbols before the next part. This part of regex is composed taking into account default "greedy" behaviour of regexes, so this part matches http://farmer.gov.in/ in your example

([^\/]+) - matches all symbols between two slashes

\/.+$ - matches all symbols till the end of the string

Example with PHP language:

<?php
$string = "http://farmer.gov.in/(S(4txk2wasxh3u0slptzi20qyj))/CWC_Link.aspx";
$regex = "/^.+\/([^\/]+)\/.+$/";
preg_match($regex, $string, $matches);
var_dump($matches);
?>

In the output $matches[1] will have your needed value (S(4txk2wasxh3u0slptzi20qyj))

Simon · Answer 3 · 2013-07-10T08:40:00.477

0

This regex does the job:

\(.*\)\/

Just match an opening bracket, then anything until a closing bracket with a forward slash.

edited Jul 10 '13 at 08:40

answered Jul 10 '13 at 07:58

Simon

7,182
2
26
42

regex expression for extracting url

3 Answers3