Java : how to get text between "http://" and first following "/" occurence ? And after first "/" occurence?

Question

I am still a novice with regular expressions, "regex", etc... in Java.

If I have an url like this : "http://somedomain.someextention/somefolder/.../someotherfolder/somepage"

What is the simplest way to get :

"somedomain.someextention" ?
"somefolder/.../someotherfolder/somepage" ?
"somepage" ?

Thanks !

take a look at this post: http://stackoverflow.com/questions/1667278/parsing-query-strings-in-java — Casimir et Hippolyte, Mar 08 '14 at 10:02

Pshemo · Answer 1 · 2014-03-08T13:38:08.690

You don't have to (and probably shouldn't) use regex here. Instead use classes defined to handle things like this. You can use for example URL, URI, File classes like

String address = "http://somedomain.someextention/somefolder/.../someotherfolder/somepage";

URL url = new URL(address);
File file = new File(url.getPath());

System.out.println(url.getHost());
System.out.println(url.getPath());
System.out.println(file.getName());

Outpit:

somedomain.someextention
/somefolder/.../someotherfolder/somepage
somepage

Now you can need to get rid of / at start of path to your resource. You can use substring(1) here if resource starts with /.

But if you really must use regex you can try with

^https?://([^/]+)/(.*/([^/]+))$

Now

group 1 will contain host name,
group 2 will contain path to resource
group 3 will contain name of resource

Stephen C · Answer 2 · 2014-03-09T00:11:27.537

The best way to get those components is to use the URI class; e.g.

    URI uri = new URI(str);
    String domain = uri.getHost();
    String path = uri.getPath();
    int pos = path.lastIndex("/");
    ...
    // or use File to parse the path string.

You could do it using regexes on the raw url string, but there is a risk that you won't correctly cope with all of the variability that is possible in a URL. (Hint: the regex supplied by @Pchenko doesn't :-)) And you would definitely need to use a decoder to deal with possible percent encoding.

score 0 · Answer 3 · answered Mar 08 '14 at 10:09

This is not a regexp or URI use but simple substring code as an excersise material. Missing few corner case format validation.

int lastDelim = str.lastIndexOf('/);
if (lastDelim<0) throw new IllegalArgumentException("Invalid url");
int startIdx = str.indexOf("//");
startIdx = startIdx<0 ? 0 : startIdx+2;
int pathDelim = str.indexOf('/', startIdx);
String domain = str.substring(startIdx, pathDelim);
String path = str.substring(pathDelim+1, lastDelim);
String page = str.substring(lastDelim+1);

score 0 · Answer 4 · answered Mar 08 '14 at 10:09

If you would like to use regex to decode the URL instead of using the URI class, as described in the previous answers, the below link gives a nice tutorial of regex, and it explains decoding a sample URL as well. You could learn it there and try it out.

http://www.beedub.com/book/2nd/regexp.doc.html

score 0 · Answer 5 · answered Mar 08 '14 at 15:02

It's not regex, or scalable at that, it works though:

public class SomeClass
{
    public static void main(String[] args)
    {

        SomeClass sclass = new SomeClass();
        String[] string = 
            sclass.parseURL("http://somedomain.someextention/somefolder/.../someotherfolder/somepage");

        System.out.println(string[0]);
        System.out.println(string[1]);
        System.out.println(string[2]);
    }

    private String[] parseURL(String url)
    {
        String part1 = url.substring("http://".length(), url.indexOf("/", "http://".length()));

        String part2 = url.substring("http://".length() + part1.length() + 1, url.lastIndexOf("/"));

        String part3 = url = url.substring(url.lastIndexOf("/") + 1);

        return new String[] { part1, part2, part3 };
    }
}

Output:

somedomain.someextention
somefolder/.../someotherfolder
somepage

Java : how to get text between "http://" and first following "/" occurence ? And after first "/" occurence?

5 Answers5