-1

I have a String like..

String desc = "<a href='http://indiatoday.intoday.in/story/nda-black-money-narendra-modi-fema-supreme-court/1/398323.html'><img src='http://media2.intoday.in/indiatoday/images/stories/black-money-nov10-2_167_103114093357.jpg'"

and I want to fetch data of href and src from this string.. like

String link1 = "http://indiatoday.intoday.in/story/nda-black-money-narendra-modi-fema-supreme-court/1/398323.html";
String link2 = "http://media2.intoday.in/indiatoday/images/stories/black-money-nov10-2_167_103114093357.jpg";

What are methods to do that. please help.

Vinnig
  • 7
  • 4
  • methods: html parsing or simple pattern search – Selvin Nov 06 '14 at 12:29
  • Parsing with the standard string methods or going for some lib which which supports html parsing (e.g. http://jsoup.org/) – mvw Nov 06 '14 at 12:31
  • well, it looked easy. but jsoup gave me this string. and i cant get these tags values futher. – Vinnig Nov 06 '14 at 13:08
  • and right now, using pattern, i am not getting result. and using indexes, it crashed during DoINBackground.. the code i am trying in asynctask's doinbackground method.. – Vinnig Nov 06 '14 at 13:10
  • who did -1 ? you think i did not search before asking a question. If i asked question, is it wrong? – Vinnig Nov 06 '14 at 13:38

3 Answers3

1

consider using the following RegExp: (href|src)='"['"]

Pattern p = Pattern.compile("(href|src)=['\"]([^'\"]['\"]+)");
Matcher m = p.matcher(desc);

if (m.matches) {
   // grab all the groups and use what you need.
}

further reading: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Adrian B.
  • 1,592
  • 1
  • 20
  • 38
  • [RegExp for html parsing :)](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454) – Selvin Nov 06 '14 at 15:35
  • they are different scenarios. The OP requested to find specific parts of a string. Now, the fact that those parts are from a HTML it does not concern this specific developer :) Also, while I agree that using RegEx for scanning elements in a html file is wrong by nature (there are a lot of xml parsers that do that very well), for punctual string matching / extracting... it's what it was build for :) – Adrian B. Nov 06 '14 at 15:46
0

Try get link from index :

String link1 = desc.substring(desc.indexOf("href='")+6,desc.indexOf(".html")+5);
String link2 = desc.substring(desc.indexOf("src='")+5,desc.lastIndexOf("'"));

I know this not generalize solution but it will defiantly solve given requirement.

Haresh Chhelana
  • 24,720
  • 5
  • 57
  • 67
0

you can use substring like this

String desc = "<a href='http://indiatoday.intoday.in/story/nda-black-money-narendra-modi-fema-supreme-court/1/398323.html'><img src='http://media2.intoday.in/indiatoday/images/stories/black-money-nov10-2_167_103114093357.jpg'";

    String string1 = desc.substring(desc.indexOf("f") + 2,
            desc.indexOf("img"));
    String string2 = desc.substring(desc.indexOf("src") + 2);

or you can also use @Adrian B code

Meenal
  • 2,879
  • 5
  • 19
  • 43