Get specific text from String

Question

I have a String like..

String desc = "&lt;a href='http://indiatoday.intoday.in/story/nda-black-money-narendra-modi-fema-supreme-court/1/398323.html'&gt;&lt;img src='http://media2.intoday.in/indiatoday/images/stories/black-money-nov10-2_167_103114093357.jpg'"

and I want to fetch data of href and src from this string.. like

String link1 = "http://indiatoday.intoday.in/story/nda-black-money-narendra-modi-fema-supreme-court/1/398323.html";
String link2 = "http://media2.intoday.in/indiatoday/images/stories/black-money-nov10-2_167_103114093357.jpg";

What are methods to do that. please help.

Parsing with the standard string methods or going for some lib which which supports html parsing (e.g. http://jsoup.org/) — mvw, Nov 06 '14 at 12:31
well, it looked easy. but jsoup gave me this string. and i cant get these tags values futher. — Vinnig, Nov 06 '14 at 13:08
and right now, using pattern, i am not getting result. and using indexes, it crashed during DoINBackground.. the code i am trying in asynctask's doinbackground method.. — Vinnig, Nov 06 '14 at 13:10
who did -1 ? you think i did not search before asking a question. If i asked question, is it wrong? — Vinnig, Nov 06 '14 at 13:38

score 1 · Answer 1 · answered Nov 06 '14 at 12:32

1

consider using the following RegExp: (href|src)='"['"]

Pattern p = Pattern.compile("(href|src)=['\"]([^'\"]['\"]+)");
Matcher m = p.matcher(desc);

if (m.matches) {
   // grab all the groups and use what you need.
}

further reading: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

answered Nov 06 '14 at 12:32

Adrian B.

1,592
1
20
38

[RegExp for html parsing :)](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454) – Selvin Nov 06 '14 at 15:35
they are different scenarios. The OP requested to find specific parts of a string. Now, the fact that those parts are from a HTML it does not concern this specific developer :) Also, while I agree that using RegEx for scanning elements in a html file is wrong by nature (there are a lot of xml parsers that do that very well), for punctual string matching / extracting... it's what it was build for :) – Adrian B. Nov 06 '14 at 15:46

score 0 · Answer 2 · answered Nov 06 '14 at 12:38

Try get link from index :

String link1 = desc.substring(desc.indexOf("href='")+6,desc.indexOf(".html")+5);
String link2 = desc.substring(desc.indexOf("src='")+5,desc.lastIndexOf("'"));

I know this not generalize solution but it will defiantly solve given requirement.

score 0 · Answer 3 · answered Nov 06 '14 at 12:38

you can use substring like this

String desc = "&lt;a href='http://indiatoday.intoday.in/story/nda-black-money-narendra-modi-fema-supreme-court/1/398323.html'&gt;&lt;img src='http://media2.intoday.in/indiatoday/images/stories/black-money-nov10-2_167_103114093357.jpg'";

    String string1 = desc.substring(desc.indexOf("f") + 2,
            desc.indexOf("img"));
    String string2 = desc.substring(desc.indexOf("src") + 2);

or you can also use @Adrian B code

Get specific text from String

3 Answers3