2

I have a string as : "This is a URL http://www.google.com/MyDoc.pdf which should be used"

I just need to extract the URL that is starting from http and ending at pdf : http://www.google.com/MyDoc.pdf

String sLeftDelimiter = "http://";
String[] tempURL = sValueFromAddAtt.split(sLeftDelimiter );
String sRequiredURL = sLeftDelimiter + tempURL[1];

This gives me the output as "http://www.google.com/MyDoc.pdf which should be used"

Need help on this.

Dante May Code
  • 11,177
  • 9
  • 49
  • 81
SMA_JAVA
  • 471
  • 4
  • 9
  • 18
  • Related to this question, please check it out: [How to detect the presence of URL in a string][1] [1]: http://stackoverflow.com/questions/285619/how-to-detect-the-presence-of-url-in-a-string – Crazenezz Apr 16 '12 at 08:50

6 Answers6

12

This kind of problem is what regular expressions were made for:

Pattern findUrl = Pattern.compile("\\bhttp.*?\\.pdf\\b");
Matcher matcher = findUrl.matcher("This is a URL http://www.google.com/MyDoc.pdf which should be used");
while (matcher.find()) {
  System.out.println(matcher.group());
}

The regular expression explained:

  • \b before the "http" there is a word boundary (i.e. xhttp does not match)
  • http the string "http" (be aware that this also matches "https" and "httpsomething")
  • .*? any character (.) any number of times (*), but try to use the least amount of characters (?)
  • \.pdf the literal string ".pdf"
  • \b after the ".pdf" there is a word boundary (i.e. .pdfoo does not match)

If you would like to match only http and https, try to use this instead of http in your string:

  • https?\: - this matches the string http, then an optional "s" (indicated by the ? after the s) and then a colon.
nd.
  • 8,699
  • 2
  • 32
  • 42
  • Thanks a lot..this one really helped...as the text before after the url can be anything , so this regex for extracting the URL is what i needed. – SMA_JAVA Apr 16 '12 at 09:06
  • If you want to support arbitrary strings that are either URLs or strings that look like URLs but don't have a protocol handler (e.g. www.foo.com), then use Gruber's regular expression http://daringfireball.net/2010/07/improved_regex_for_matching_urls – nd. Apr 16 '12 at 09:13
  • Thanks for crisp answer ....I used as : Pattern findUrl = Pattern.compile("\\bversion-.*?\\.0.0\\b"); Matcher matcher = findUrl.matcher(response.toString()); if (matcher.find()) { System.out.println(matcher.group().substring(10,13)); // to get the substring } – Shashank Bodkhe Jan 10 '19 at 07:46
1

why don't you use startsWith("http://") and endsWith(".pdf") mthods of String class.

Both the method returns boolean value, if both returns true, then your condition succeed else your condition is failed.

Chandra Sekhar
  • 18,914
  • 16
  • 84
  • 125
  • The question states that he has a string which contains "This is a URL `URL` which should be used". I don't see how `startsWith()` and `endsWith()` are applicable here. – Arvindh Mani Mar 19 '17 at 06:13
1

Try this

String StringName="This is a URL http://www.google.com/MyDoc.pdf which should be used";

StringName=StringName.substring(StringName.indexOf("http:"),StringName.indexOf("which"));
Nishant
  • 32,082
  • 5
  • 39
  • 53
0

You can use Regular Expression power for here. First you have to find Url in original string then remove other part.

Following code shows my suggestion:

    String regex = "\\b(http|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
    String str = "This is a URL http://www.google.com/MyDoc.pdf which should be used";

    String[] splited = str.split(regex);

    for(String current_part : splited)
    {
        str = str.replace(current_part, "");
    }

    System.out.println(str);

This snippet code cans retrieve any url in any string with any pattern. You cant add customize protocol such as https to protocol part in above regular expression.

I hope my answer help you ;)

Sam
  • 6,770
  • 7
  • 50
  • 91
  • Please note that this pattern does not match internationalized domain names such as http://مثال.إختبار – nd. Apr 17 '12 at 11:42
0
public static String getStringBetweenStrings(String aString, String aPattern1, String aPattern2) {
    String ret = null;
    int pos1,pos2;

    pos1 = aString.indexOf(aPattern1) + aPattern1.length();
    pos2 = aString.indexOf(aPattern2);

    if ((pos1>0) && (pos2>0) && (pos2 > pos1)) {
        return aString.substring(pos1, pos2);
    }

    return ret;
}
0

You can use String.replaceAll with a capturing group and back reference for a very concise solution:

String input = "This is a URL http://www.google.com/MyDoc.pdf which should be used";
System.out.println(input.replaceAll(".*(http.*?\\.pdf).*", "$1"));

Here's a breakdown for the regex: https://regexr.com/3qmus

Rony
  • 45
  • 9