0

I'm newbie to Java, I want to get all of the URL in the text below

WEBSITE1 https://localhost:8080/admin/index.php?page=home
WEBSITE2 https://192.168.0.3:8084/index.php
WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home
WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum

the result that I want is:

https://localhost:8080
https://192.168.0.3:8084
https://192.168.0.5
https://192.168.0.1:8080

I want to store it into the Linked List or Array too. Can somebody teach me? Thank You

user1973423
  • 31
  • 3
  • 8

5 Answers5

1

This is how you can do this. I did one for you and you do the rest :)

try {
            ArrayList<String> urls = new ArrayList<String>();
            URL aURL = new URL("https://localhost:8080/admin/index.php?page=home");
             System.out.println("protocol = " + aURL.getProtocol()+aURL.getHost()+aURL.getPort());
             urls.add(aURL.getProtocol()+aURL.getHost()+aURL.getPort());
        } catch (MalformedURLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
grepit
  • 21,260
  • 6
  • 105
  • 81
0

Let's say the line represents a single line (probably in a loop):

//get the index of "https" in the string
int indexOfHTTPS= line.indexOf("https://");
//get the index of the first "/" after the "https"
int indexOfFirstSlashAfterHTTPS= line.indexOf("/", indexOfHTTPS + "https://".length());

//take a string between "https" and the first "/"
String url = line.substring(indexOfHTTPS, indexOfFirstSlashAfterHTTPS);

Later on, add this url to an ArrayList<String>:

ArrayList<String> urlList= new ArrayList<String>();
urlList.add(url);
darijan
  • 9,725
  • 25
  • 38
0

You can do it with the help of URL class.

 public static void main(String[] args) throws MalformedURLException { 

        String string ="https://192.168.0.5:9090/controller/index.php?page=home";
        URL url= new URL(string);
        String result ="https://"+url.getHost()+":"+url.getPort();
        System.out.println(result);
    }

Output :https://192.168.0.5:9090
Suresh Atta
  • 120,458
  • 37
  • 198
  • 307
  • That won't grab the port -- hence the interest of using `URI` instead. What is more, `URI` will never attempt to resolve hostnames – fge Jun 20 '13 at 13:30
0

You could either try to find the index of the protocol substring ("http[s]") in the Strings, or use a simple Pattern (only for matching the "website[0-9]" head, not to apply to the URLs).

Here's a solution with the Pattern.

String webSite1 = "WEBSITE1 https://localhost:8080/admin/index.php?page=home";
String webSite2 = "WEBSITE2 https://192.168.0.3:8084/index.php";
String webSite3 = "WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home";
String webSite4 = "WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum";
ArrayList<URI> uris = new ArrayList<URI>();
Pattern pattern = Pattern.compile("^website\\d+\\s+?(.+)", Pattern.CASE_INSENSITIVE);
Matcher matcher;
matcher = pattern.matcher(webSite1);
if (matcher.find()) {
    try {
        uris.add(new URI(matcher.group(1)));
    }
    catch (URISyntaxException use) {
        use.printStackTrace();
    }
}
matcher = pattern.matcher(webSite2);
if (matcher.find()) {
    try {
        uris.add(new URI(matcher.group(1)));
    }
    catch (URISyntaxException use) {
        use.printStackTrace();
    }
}
matcher = pattern.matcher(webSite3);
if (matcher.find()) {
    try {
        uris.add(new URI(matcher.group(1)));
    }
    catch (URISyntaxException use) {
        use.printStackTrace();
    }
}
matcher = pattern.matcher(webSite4);
if (matcher.find()) {
    try {
        uris.add(new URI(matcher.group(1)));
    }
    catch (URISyntaxException use) {
        use.printStackTrace();
    }
}
System.out.println(uris);

Output:

[https://localhost:8080/admin/index.php?page=home, https://192.168.0.3:8084/index.php, https://192.168.0.5:9090/controller/index.php?page=home, https://192.168.0.1:8080/home/index.php?page=forum]
Mena
  • 47,782
  • 11
  • 87
  • 106
0

Use a simple regexp to locate what's starting with https?:// and then just extract this until the first /

Matcher m = Pattern.compile("(https?://[^/]+)").matcher(//
        "WEBSITE1 https://localhost:8080/admin/index.php?page=home\r\n" + //
        "WEBSITE2 https://192.168.0.3:8084/index.php\r\n" + //
        "WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home\r\n" + //
        "WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum");
List<String> urls = new ArrayList<String>();
while (m.find()) {
    urls.add(m.group(1));
}
System.out.println(urls);

Now if you do want to get only the WEBSITE. part you will only have to change the regular expression "(https?://[^/]+)" with the following one: "(.*?)\\s+https?". The rest of the code stays untouched.

Alex
  • 25,147
  • 6
  • 59
  • 55
  • thank you, and what if I want to get the list of website name? for example, the result that I want: website1 website2 website3 website4 thank you agan :-) – user1973423 Jun 21 '13 at 23:51