How to remove some part of URL by regex?

Question

I have full link like this:

http://localhost:8080/suffix/rest/of/link

How to write regex in Java which will return only main part of url with suffix: http://localhost/suffix and without: /rest/of/link?

possible protocols: http, https
possible ports: many possibilities

I've assumed that I need to remove whole text after 3rd occurrence of '/' mark (including). I would like to do it as below, but I do not know regex well, can you help please how to write regex correctly?

String appUrl = fullRequestUrl.replaceAll("(.*\\/{2})", ""); //this removes 'http://' but this is not my case

What's the point of the regex? Just find the index of the fourth `/`. — Dave Newton, Oct 08 '13 at 17:50
The point is to retrieve base application url (protocol+serverName+serverPort+contextPath) from url which can be full it means which can have also servlet path and params which I am not interested. — Roman, Oct 08 '13 at 17:54
`URL` will not recognize contextPath from simple String. I've already tried it. — Roman, Oct 08 '13 at 17:55
See if this helps : http://stackoverflow.com/q/27745/2666913 — Venkateshwaran Selvaraj, Oct 08 '13 at 17:56
what do you mean `URL` won't recognize the context path? certainly it will. — jtahlborn, Oct 08 '13 at 18:08
after initialize by constructor URL(String url) then we don't have info about contextPath. URL gives protocol, host, port, path. The path property contains contextPath and other stuff. So I still need to parse it some how — Roman, Oct 08 '13 at 18:19
yes, obviously. you would need to separate out the part of the path you care about. but URL will handle the larger parsing issues and leave you with a simple problem (pulling "suffix" off of the path). — jtahlborn, Oct 08 '13 at 18:22

score 5 · Answer 1 · answered Oct 08 '13 at 18:42

I am not sure why you want to use Regex for this. Java provides a Query URL Objects for doing the same for you.

Here is an example taken from the same site to show how it works:

import java.net.*;
import java.io.*;

public class ParseURL {
    public static void main(String[] args) throws Exception {

        URL aURL = new URL("http://example.com:80/docs/books/tutorial"
                           + "/index.html?name=networking#DOWNLOADING");

        System.out.println("protocol = " + aURL.getProtocol());
        System.out.println("authority = " + aURL.getAuthority());
        System.out.println("host = " + aURL.getHost());
        System.out.println("port = " + aURL.getPort());
        System.out.println("path = " + aURL.getPath());
        System.out.println("query = " + aURL.getQuery());
        System.out.println("filename = " + aURL.getFile());
        System.out.println("ref = " + aURL.getRef());
    }
}

Here is the output displayed by the program:

protocol = http
authority = example.com:80
host = example.com
port = 80
path = /docs/books/tutorial/index.html
query = name=networking
filename = /docs/books/tutorial/index.html?name=networking
ref = DOWNLOADING

nice, but the point is that I need to seperate contextPath which is the first part after port in url. So `URL` class is not able to recognize it (but it would be nice). Here in your example contextPath is `docs` - and with `URL` I still need to parse `path` and exlude rest of text from `path` — Roman, Oct 08 '13 at 18:58
How about finding the index of the / and then do what you want? — Rahul Tripathi, Oct 08 '13 at 19:00
@Roman:- Yes that may help. Give that a try! :) P.S. And if this helped you then do upvote or accept this as an answer! :) — Rahul Tripathi, Oct 08 '13 at 19:07

score 2 · Accepted Answer · answered Oct 08 '13 at 18:38

The code gets main part of URL:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexpExample {
    public static void main(String[] args) {
        String urlStr  = "http://localhost:8080/suffix/rest/of/link";
        Pattern pattern = Pattern.compile("^((.*:)//([a-z0-9\\-.]+)(|:[0-9]+)/([a-z]+))/(.*)$");

        Matcher matcher = pattern.matcher(urlStr);
        if(matcher.find())
        {
            //there is a main part of url with suffix:
            String mainPartOfUrlWithSuffix = matcher.group(1);
            System.out.println(mainPartOfUrlWithSuffix);
        }
    }
}

How to remove some part of URL by regex?

2 Answers2

Linked