2

I have full link like this:

http://localhost:8080/suffix/rest/of/link

How to write regex in Java which will return only main part of url with suffix: http://localhost/suffix and without: /rest/of/link?

  • possible protocols: http, https
  • possible ports: many possibilities

I've assumed that I need to remove whole text after 3rd occurrence of '/' mark (including). I would like to do it as below, but I do not know regex well, can you help please how to write regex correctly?

String appUrl = fullRequestUrl.replaceAll("(.*\\/{2})", ""); //this removes 'http://' but this is not my case
Roman
  • 1,121
  • 4
  • 23
  • 38
  • 1
    What's the point of the regex? Just find the index of the fourth `/`. – Dave Newton Oct 08 '13 at 17:50
  • The point is to retrieve base application url (protocol+serverName+serverPort+contextPath) from url which can be full it means which can have also servlet path and params which I am not interested. – Roman Oct 08 '13 at 17:54
  • `URL` will not recognize contextPath from simple String. I've already tried it. – Roman Oct 08 '13 at 17:55
  • See if this helps : http://stackoverflow.com/q/27745/2666913 – Venkateshwaran Selvaraj Oct 08 '13 at 17:56
  • what do you mean `URL` won't recognize the context path? certainly it will. – jtahlborn Oct 08 '13 at 18:08
  • after initialize by constructor URL(String url) then we don't have info about contextPath. URL gives protocol, host, port, path. The path property contains contextPath and other stuff. So I still need to parse it some how – Roman Oct 08 '13 at 18:19
  • yes, obviously. you would need to separate out the part of the path you care about. but URL will handle the larger parsing issues and leave you with a simple problem (pulling "suffix" off of the path). – jtahlborn Oct 08 '13 at 18:22

2 Answers2

5

I am not sure why you want to use Regex for this. Java provides a Query URL Objects for doing the same for you.

Here is an example taken from the same site to show how it works:

import java.net.*;
import java.io.*;

public class ParseURL {
    public static void main(String[] args) throws Exception {

        URL aURL = new URL("http://example.com:80/docs/books/tutorial"
                           + "/index.html?name=networking#DOWNLOADING");

        System.out.println("protocol = " + aURL.getProtocol());
        System.out.println("authority = " + aURL.getAuthority());
        System.out.println("host = " + aURL.getHost());
        System.out.println("port = " + aURL.getPort());
        System.out.println("path = " + aURL.getPath());
        System.out.println("query = " + aURL.getQuery());
        System.out.println("filename = " + aURL.getFile());
        System.out.println("ref = " + aURL.getRef());
    }
}

Here is the output displayed by the program:

protocol = http
authority = example.com:80
host = example.com
port = 80
path = /docs/books/tutorial/index.html
query = name=networking
filename = /docs/books/tutorial/index.html?name=networking
ref = DOWNLOADING
Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
  • @Boris the Spider:- Were you refering this? – Rahul Tripathi Oct 08 '13 at 18:47
  • nice, but the point is that I need to seperate contextPath which is the first part after port in url. So `URL` class is not able to recognize it (but it would be nice). Here in your example contextPath is `docs` - and with `URL` I still need to parse `path` and exlude rest of text from `path` – Roman Oct 08 '13 at 18:58
  • How about finding the index of the / and then do what you want? – Rahul Tripathi Oct 08 '13 at 19:00
  • @Roman:- Yes that may help. Give that a try! :) P.S. And if this helped you then do upvote or accept this as an answer! :) – Rahul Tripathi Oct 08 '13 at 19:07
2

The code gets main part of URL:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexpExample {
    public static void main(String[] args) {
        String urlStr  = "http://localhost:8080/suffix/rest/of/link";
        Pattern pattern = Pattern.compile("^((.*:)//([a-z0-9\\-.]+)(|:[0-9]+)/([a-z]+))/(.*)$");

        Matcher matcher = pattern.matcher(urlStr);
        if(matcher.find())
        {
            //there is a main part of url with suffix:
            String mainPartOfUrlWithSuffix = matcher.group(1);
            System.out.println(mainPartOfUrlWithSuffix);
        }
    }
}
Vitalii Pro
  • 313
  • 2
  • 17