Splitting a string that contains url/link in java

Question

Supposing we have a string that contains some text in the beggining and at the end of the string there is a url/link e.x. http://www.google.com . What is the best way to split this string in 2 variables : DescriptionTxt , LinkTxt

Thanks in advance.

Please provide some example inputs and their respective outputs. — Todd Sewell, Jan 26 '17 at 20:56
Possible duplicate of [Detect and extract url from a string?](http://stackoverflow.com/questions/5713558/detect-and-extract-url-from-a-string) — Grzegorz Górkiewicz, Jan 26 '17 at 20:57
It depends on the link pattern. Can you rely on the fact that it begins with http:// ? — Jan B., Jan 26 '17 at 20:57
Yes, I'm producing the urls so each of them is going to start with http . However , I don't want to lose the http pattern , at the beggining of the splitted text I'm going to create. — Dimitris K., Jan 26 '17 at 21:02

Ben Arnao · Answer 1 · 2017-01-26T21:04:03.430

0

String[] results = mystring.split(indexOf("http"));

Then if you wanted two separate Strings,

String DescriptionTxt = results[0];
String LinkTxt = results[1];

edited Jan 26 '17 at 21:04

answered Jan 26 '17 at 20:59

Ben Arnao

492
5
11

For example : Initial String "Thanks Ben Arnao for your help http://www.stackoverflow.com" . The result is going to be likes this : (?) : results[0] = "Thanks Ben Arnao for your help" , results[1] ="http://www.stackoverflow.com" – Dimitris K. Jan 26 '17 at 21:03
Ok well this really depends on the format of your input and how you want your program to recognize the URL. If it's always going to be the last x number of characters in the string after a whitespace, you could use lastIndexOf(" ") as your delimiter instead – Ben Arnao Jan 26 '17 at 21:06
Thank you very much @Ben Arnao , I'm going to try this solution! – Dimitris K. Jan 26 '17 at 21:12
@Dimitris K why don't you upvote and accept his answer if it worked. – shashwatZing Jan 26 '17 at 21:19

score 0 · Answer 2 · answered Jan 26 '17 at 21:34

Detecting patterns is always tricky. There might be URLs that contain the keywords you look for. For instance:

A short description http://my.foo.bar/http-is-a-protocol

If you go with lastIndexOf("http"), your parser will fail. A good solution can be much more complex that assumed in the first place. In an advanced algorithm you could go for http://, but https:// is just as valid. And don't forget capital letters like HTTP://.

And is there a reason why http:// would not occur in your description, as well?

You won't get a complete solution here for your problem. Try to cover most of the cases with moderate effort and make sure you know what to do when your algorithm fails for something you haven't expected.

Splitting a string that contains url/link in java

2 Answers2