0

I want to get the URL from a string so I can show the url in WebView.

example strings:

exp 1- Hello dilip refer this url www.google.com.
exp 2- hi ramesh this is good for android http://android.com

I want only www.google.com and http://android.com how can I split them out of the String

laalto
  • 150,114
  • 66
  • 286
  • 303
Mahi
  • 1,754
  • 2
  • 16
  • 37

4 Answers4

2

If you simply want to retrieve the URL from a String i would suggest simply to look for a question at stackoverflow.

Like this:

public static final String URL_REGEX = "^((https?|ftp)://|(www|ftp)\\.)?[a-z0-9-]+(\\.[a-    z0-9-]+)+([/?].*)?$";

 Pattern p = Pattern.compile(URL_REGEX);
 Matcher m = p.matcher("example.com");//replace with string to compare
 if(m.find()) {
 System.out.println("String contains URL");
}

From this post: https://stackoverflow.com/a/11007981/1164919 And you will find more snippets and suggestions on how to do this in the same thread.

But if you want to do it yourself to understand how this works. You can also make your own simple snippet to detect if there is a URL in a string. You can for example also use, if(String.Contains("something")). this will simply return true or false if your input exists in the String.

There are dozens of examples out there waiting for you to be read. Search on something like: regex or if that is to hard, String.split etc.

Community
  • 1
  • 1
Dion Segijn
  • 2,625
  • 4
  • 24
  • 41
1

I suggest splitting the string into all the substrings which don't have spaces, and then choosing the one which has contains a "." embedded between other characters. In normal English, a "." only tends to occur in URLs.

Stochastically
  • 7,616
  • 5
  • 30
  • 58
1

Here is one possible solution. The following regex assumes that it found a URL when a period follows a letter and that a letter immediatly follow that period. Here are some examples of what it will match:

t.t
hello.aspx
www.google.com
http://android.com
http://android.com/test/test.aspx
https://www.stackoverflow.com/questions.html?type=android
www.google.com/android/games.aspx#hello

Here is the regex (use with IgnoreCase option):

(https?://)?[-A-Z0-9]+\.[-A-Z0-9.]+(/[-A-Z0-9+&@#/%=~_|!:,.;?]*)?

Running it against your sample text returns both the URLs you wanted.

Here some sample Java code that uses this regex:

String testInputString = "Test 1 www.google.co.uk Test 2 www.google.co.in Test 3 www.google.com Test 4 http://android.com Test 5 meta.stackoverflow.com";
Pattern p = Pattern.compile("(https?://)?[-A-Z0-9]+\\.[-A-Z0-9.]+(/[-A-Z0-9+&@#/%=~_|!:,.;?]*)?", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher m = p.matcher(testInputString);
while (m.find()) {
    System.out.println(m.group(0));
} 
Francis Gagnon
  • 3,545
  • 1
  • 16
  • 25
  • You are missing a escape character (https?://)?[-A-Z0-9]\\.[-A-Z0-9.]+(/[-A-Z0-9+&@#/%=~_|!:,.;?]*)?". However i tried the regex din't work for me. The string was as in my answer. I tried the regex on the string din't work – Raghunandan May 18 '13 at 17:29
  • @Raghunandan - The regex in my answer is indeed raw and not pre-escaped. I feel it is better to not escape characters when the regex is not presented in a block of example code. I retested the expression and it works just fine. Did you make the regex case insensitive when you tried it? If you prefer not to use the "case-insensitive" option, you can use the following regex instead. (https?://)?[-a-zA-Z0-9]+\.[-a-zA-Z0-9.]+(/[-a-zA-Z0-9+&@#/%=~_|!:,.;?]*)? – Francis Gagnon May 18 '13 at 17:43
  • The above regex gives me illegal escape character. I am testing it on android and java. If i escape the illegal cahracter. the regex does not match the string that's in my answer. The change does not work for "Hello dilip refer www.google.co.uk www.google.co.in this url www.google.com. hi ramesh this is good for android http://android.com hello there meta.stackoverflow.com"; – Raghunandan May 18 '13 at 17:46
  • @Raghunandan - I added some sample Java code to my answer. It worked correctly for me using your test input string. I also tested the regex successfully on http://www.regexplanet.com/advanced/java/index.html. – Francis Gagnon May 18 '13 at 18:08
0

Assuming your string is as below you can use regex as below to extract www.google.com and http://android.com.

String s = "Hello dilip refer this url www.google.com. hi ramesh this is good for  android http://android.com";   
Pattern pc = Pattern.compile("((http://)|(www.))[A-Z,a-z]+.com");
Matcher matcher = pc.matcher(s);
while(matcher.find())
{
   System.out.println("String Extracted   "+matcher.group());
}

Output

String Extracted   www.google.com
String Extracted   http://android.com 

Note: The above does not work for these kind of urls http://meta.stackoverflow.com ,www.google.co.uk and b3ta.com.

Edit:

       String s = "Hello dilip refer www.google.co.uk www.google.co.in this url www.google.com. hi ramesh this is good for android http://android.com hello there meta.stackoverflow.com";   
       Pattern pc = Pattern.compile("((http://)|(www.))([A-Z,a-z,0-9])+((.com)|(.co.[a-z]{2}))|([A-Z,a-z,0-9].[A-Z,a-z,0-9])+.com");
       Matcher matcher = pc.matcher(s);
       while(matcher.find())
       {
          System.out.println("String Extracted   "+matcher.group());
       }

Output:

       String Extracted   www.google.co.uk
       String Extracted   www.google.co.in
       String Extracted   www.google.com
       String Extracted   http://android.com
       String Extracted   meta.stackoverflow.com

Even the above is not perfect. But if you can modify the above regex it should help you

Raghunandan
  • 132,755
  • 26
  • 225
  • 256
  • That works for this example, but not for `http://meta.stackoverflow.com`, `www.google.co.uk`, and `b3ta.com`. – Ken Y-N May 18 '13 at 14:35
  • @KenY-N i have edited the post but even that may not be perfect. But i guess it covers few more cases – Raghunandan May 18 '13 at 14:50