0

I am trying to extract 2 pieces from the following webpage, the article-body,and the src from the article-image

Can anybody show me how to go about extracting those 2 pieces, in java

http://www.ncataggies.com//ViewArticle.dbml?DB_OEM_ID=24500&ATCLID=205417767

user1154644
  • 4,491
  • 16
  • 59
  • 102
  • 2
    Asking for "the best" doesn't usually work well here on [SO] -- the exchange isn't really set up for polls. You've also given no criteria by which one could judge one library better than another... – sarnold Apr 23 '12 at 00:10
  • Ok. I guess I would just be looking for a library that someone is familiar with, and some hints on how to extract that data. – user1154644 Apr 23 '12 at 00:12
  • I've looked into jsoup, but I can't quite figure out how to grab the data. Can you give me some assistance? – user1154644 Apr 23 '12 at 00:13
  • I should probably mention that this is for an android application. – user1154644 Apr 23 '12 at 00:14
  • It may be (depending on view) "unethical" or (depending on jurisdicition) "illegal" to copy certain website content. –  Apr 23 '12 at 00:17
  • @user1154644 Well, **What have you tried?** Did it work? If not *why* not? SO isn't community-programming ;-) Also, there are *plenty of questions dealing with JSoup* (or whatever tool you choose). –  Apr 23 '12 at 00:18
  • possible duplicate of [Parse HTML in Android](http://stackoverflow.com/questions/2188049/parse-html-in-android) , also http://stackoverflow.com/questions/7114282/how-can-you-parse-html-in-android –  Apr 23 '12 at 00:20

1 Answers1

0

Java or javascript?

If I were to do this, when I wanted to create the page I would open the source of the URL and get the text from inside class="article-body", then inside class="photocopy" I would scrape the src="". This would give you all text and the source of the image.

So just load the page and use basic string operations to find the right class then extract the contents.

Does this help? If you need help with the specific code give it a try first, post what you have, and I can help you from there.

Josh Dean
  • 1,593
  • 2
  • 11
  • 17
  • for some reason, I'm getting a NoClassDefFoundError when trying to use JSoup. I have definitely added the jar file, not sure what's going on there. – user1154644 Apr 23 '12 at 00:35