2
String str = "internet address : http://test.com Click this!";

I want to get "http://test.com", so I wrote like this.

String[] split = str.split(" ");
for ( int i = 0 ; i < split.length ; i++ ) {
    if ( split[i].contains("http://") ) {
        return split[i];
    }
}

but I think this is ineffective. how to get that more easily?

Warak
  • 23
  • 3
  • [regex](https://stackoverflow.com/questions/tagged/regex) seems to suit you well in this case, but must clarify the pattern carefully. – Dang Nguyen Jan 17 '19 at 09:04
  • See [this question](https://stackoverflow.com/questions/163360/regular-expression-to-match-urls-in-java) – Sami Hult Jan 17 '19 at 09:08
  • Which part of the example string is constant and which part is variable? – Henry Jan 17 '19 at 09:10
  • 1
    Why do you believe your code is "ineffective"? Does your code not work in certain cases? A solution that uses eg. regex can also get complex quite easily. – TiiJ7 Jan 17 '19 at 09:16

7 Answers7

1

Assuming you always have the same format (some text : URL more text) this can work:

public static void main(String[] args) throws IOException {
    String str = "internet address : http://test.com Click this!";
    String first = str.substring(str.indexOf("http://"));
    String second = first.substring(0, first.indexOf(" "));
    System.out.println(second);
}

But better is regex as suggested in different answer

Akceptor
  • 1,914
  • 3
  • 26
  • 31
  • There's no need to use `substring` twice: use the `indexOf(String, int)` overload to get the end. But remember that you (may) need to handle the cast of there being no following space. – Andy Turner Jan 17 '19 at 09:21
  • Wow this solution is great! I feel i'm stupid, why am I couldn't think this way? Andy Turner is great too! – Warak Jan 17 '19 at 09:28
  • @Warak if you unsure about the input string format - regex is the best way IMO – Akceptor Jan 17 '19 at 09:34
  • I think this solution is not perfect when string input is not end with " ". so I need to check 'second' is -1. regex is better maybe. – Warak Jan 17 '19 at 09:52
  • @Warak actually it looks for space after URL regardless the input string end. But sure regex is the best way, just be careful with a proper pattern – Akceptor Jan 17 '19 at 10:14
  • I tried to compare regex and substring with index. regex is 119162 nanoTime, substring is 23238 nanoTime. substring is the best way for performance. – Warak Jan 17 '19 at 10:23
1

Usually, this is either done with a regular expression or with indexOf and substring.

With a regular expression, this can be done like that:

    // This is using a VERY simplified regular expression
    String str = "internet address : http://test.com Click this!";
    Pattern pattern = Pattern.compile("[http:|https:]+\\/\\/[\\w.]*");
    Matcher matcher = pattern.matcher(str);
    if (matcher.find()) {
        System.out.println(matcher.group(0));
    }

You can read here why it's simplified: https://mathiasbynens.be/demo/url-regex - tl;dr: the problem with URLs is they can have so many different patterns which are valid.

With split, there would be a way utilizing the URL class of Java:

   String[] split = str.split(" ");

    for (String value : split) {
        try {
            URL uri = new URL(value);
            System.out.println(value);
        } catch (MalformedURLException e) {
            // no valid url
        }
    }

You can check their validation in the OpenJDK source here.

maio290
  • 6,440
  • 1
  • 21
  • 38
0

My try with regex

String regex = "http?:\\/\\/(www\\.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)";
String str = "internet address : http://test.com Click this!";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
    System.out.println(matcher.group(0));
}

result:

http://test.com

source: here

Dang Nguyen
  • 1,209
  • 1
  • 17
  • 29
  • 1
    regex. Thank you very much! – Warak Jan 17 '19 at 09:36
  • @Warak I know you have your own accepted answer and I have no complaint about that. But a little notice: keep in mind the comment I did mention about `but must clarify the pattern carefully`, that's why I put the link in my answer for further investigation/understanding all the risk to choose which pattern suit you the most. – Dang Nguyen Jan 17 '19 at 10:21
0

Find the http:// in the string, then look forwards and backwards for the space:

int pos = str.indexOf("http://");
if (pos >= 0) {
  // Look backwards for space.
  int start = Math.max(0, str.lastIndexOf(' ', pos));

  // Look forwards for space.
  int end = str.indexOf(' ', pos + "http://".length());
  if (end < 0) end = str.length();

  return str.substring(start, end);
}
Andy Turner
  • 137,514
  • 11
  • 162
  • 243
  • Isn't Math class heavy? is better performance than split or regex? – Warak Jan 17 '19 at 09:32
  • No, of course it's not. I mean, use `if (start < 0) start = 0;` if you want (and that would be consistent with the looking forwards); but really, that is nano-optimization. – Andy Turner Jan 17 '19 at 09:42
  • Why do you look for the backwards space? Surely, you can just start at `pos`? – TiiJ7 Jan 17 '19 at 09:49
  • @TiiJ7 because OP was searching for strings *containing* `http://`. Had the original code been `split[i].startsWith("http://")`, you wouldn't need to search backwards. – Andy Turner Jan 17 '19 at 09:51
0

It is not clear if the structure of the input string is constant, however, I would do something like this:

    String str = "internet address : http://test.com Click this!";
    // get the index of the first letter of an url
    int urlStart = str.indexOf("http://");
    System.out.println(urlStart);
    // get the first space after the url
    int urlEnd = str.substring(urlStart).indexOf(" ");
    System.out.println(urlEnd);
    // get the substring of the url
    String urlString = str.substring(urlStart, urlStart + urlEnd);
    System.out.println(urlString);
kralizec
  • 1
  • 2
0

I just made a quick solution for the same. It should work for you perfectly.

package Main.Kunal;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class URLOutOfString {

    public static void main(String[] args) {
        String str = "internet address : http://test.com Click this!, internet address : http://tes1t.com Click this!";
        List<String> result= new ArrayList<>();
        int counter = 0;
        final Pattern urlPattern = Pattern.compile(
                "(?:^|[\\W])((ht|f)tp(s?):\\/\\/|www\\.)"
                        + "(([\\w\\-]+\\.){1,}?([\\w\\-.~]+\\/?)*"
                        + "[\\p{Alnum}.,%_=?&#\\-+()\\[\\]\\*$~@!:/{};']*)",
                Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);

        Matcher matcher = urlPattern.matcher(str);

        while (matcher.find()) {
            result.add(str.substring(matcher.start(1), matcher.end()));
            counter++;
        }

        System.out.println(result);

    }

}

This will find all URLs in your string and add it to arraylist. You can use it as per your business need.

Kunal Vohra
  • 2,703
  • 2
  • 15
  • 33
0

You could use regex for it

String str = "internet address : http://test.com Click this!";
Pattern pattern = Pattern.compile("((http|https)\\S*)");
Matcher matcher = pattern.matcher(str);
if (matcher.find())
{
    System.out.println(matcher.group(1));
}