3

I have the following regex \/\/.*\/.*? and I am applying it to strings in this format: mongodb://localhost:27017/admin?replicaSet=rs

Based on the above the returned match is: //localhost:27017/ however, I do not want the //../ characters I only want the result to be: localhost:27017

What needs to be modified in order to achieve this, I am fairly new to regex building.

Edit: I am using Java 1.7 to execute this regex statement.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Chris Edwards
  • 1,518
  • 2
  • 13
  • 23
  • Can you use lookbehinds? `(?<=\/\/)[^\/]*`. – Wiktor Stribiżew Jul 13 '15 at 16:24
  • What language are you using? You might be better off using existing code that extracts the hostname and port number from URLs rather than messing with a regex to do it. – Andy Lester Jul 13 '15 at 16:25
  • I am using Java 1.7, any recommendations? – Chris Edwards Jul 13 '15 at 16:28
  • I don't know about Java, but there has to be a URL parsing function somewhere, I would think. I know that in PHP, use the [`parse_url`](http://php.net/manual/en/function.parse-url.php) function. Perl: [`URI` module](http://search.cpan.org/dist/URI/). Ruby: [`URI` module](http://www.ruby-doc.org/stdlib-1.9.3/libdoc/uri/rdoc/URI.html). .NET: ['Uri' class](http://msdn.microsoft.com/en-us/library/txt7706a.aspx) – Andy Lester Jul 13 '15 at 17:08
  • Here you go: https://stackoverflow.com/questions/13408498/parsing-a-url-in-java – Andy Lester Jul 13 '15 at 17:10

2 Answers2

1

You can use this replaceAll approach in Java if you do not want to use Matcher:

System.out.println("mongodb://localhost:27017/admin?replicaSet=rs".replaceAll("mongodb://([^/]*).*", "$1")); 

Here, I assume you have 1 occurrence of a mongodb URL. mongodb:// matches the sequence of characters literally, the ([^/]*) matches a sequence of 0 or more characters other than / and stores them in a capturing group 1 (we'll use the backreference $1 to this group to retrieve the text in the replacement pattern). .* matches all symbols up to the end of a one-line string.

See IDEONE demo

Or, with Matcher,

Pattern ptrn = Pattern.compile("(?<=//)[^/]*");
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
   System.out.println(matcher.group());
}

The regex here - (?<=//)[^/]* - matches again a sequence of 0 or more characters other than / (with [^/]*), but makes sure there is // right before this sequence. (?<=//) is a positive lookbehind that does not consume characters, and thus does not return them in the match.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks, could explain what it actually does? – Chris Edwards Jul 13 '15 at 16:35
  • I added explanations and some links. Please let me know if there is anything unclear about this. Lots of people don't like `Matcher` since it requires adding another `import`, and more code, but it helps you retrieve matches or capturing groups individually. – Wiktor Stribiżew Jul 13 '15 at 16:41
0

You can use a combination of lookbehinds and lookaheads to replace all the "/" like: (?<=\/\/).+.*(?=\/).*?

Will match only localhost:27017 from your query

Here the (?<=\/\/). is matching to anything after the "//" string and the (?=\/) is matching anything before the "/" string

William Moore
  • 3,844
  • 3
  • 23
  • 41