-3

I have a HTML String as By<!--[10018729]//--> <a href=\"/author/10018729_kurt-ernst\" title=\"show Kurt Ernst's profile page\"><img src=\"http://images.thecarconnection.com/tny/avatar-image-for-ernst_100345731_0.jpg\" alt=\"Kurt Ernst\" width=\"20\" height=\"20\"/> Kurt Ernst</a>, Contributor". I have to get data from the this string as the name as Kurt Ernst The Url as http://images.thecarconnection.com/tny/avatar-image-for-ernst_100345731_0.jpg. I don't know how to get it? Please suggest any solution regarding the same.

Thanks in advance.

Sanat Pandey
  • 4,081
  • 17
  • 75
  • 132
  • You can use regular expressions, or an HTML parser, or if it is well-formed XHTML you can also use an XML parser for getting that information. – tiguchi Jul 09 '12 at 21:13
  • jsoup http://jsoup.org/cookbook/introduction/parsing-a-document – SRN Jul 09 '12 at 21:18
  • @NobuGames [You Cannot parse HTML with regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – FoamyGuy Jul 09 '12 at 21:27
  • @Tim I've read that tiny rant and technically speaking is correct. But it is not about parsing HTML, because it is also not about retrieving a DOM or abstract syntax tree via regular expressions, which does not work, I agree. It is about retrieving simple text information from a text file and for that purpose a regular expression works. I've done it a hundred times. – tiguchi Jul 09 '12 at 21:29
  • @NobuGames You are asking for trouble IMO if you choose to use RegEx for this purpose. Even with the simplified example OP gives. – FoamyGuy Jul 09 '12 at 21:30
  • You would run into trouble with any approach because HTML files coming from some web site not under your own control are always in flux. No HTML or XML parser and no XPath can help you if the HTML structure fundamentally changes. – tiguchi Jul 09 '12 at 21:38

1 Answers1

3

I did this kind of parsing using my own logic. It has worked for me. You can get the String from the URL then use an if condition to get the title and src by writing a while or for loop. Break the loops when you get both the title and the src.

Jennifer S
  • 1,419
  • 1
  • 24
  • 43
  • Can you provide me some line of codes regarding to the same, because I am not getting that what condition should be used in the IF or in While Loop. – Sanat Pandey Jul 11 '12 at 16:30