1

Does anyone know of a quick way that I can get information from a webpage in Java? For instance, if I'm looking at a page like this: http://www.ncbi.nlm.nih.gov/pubmed/?term=10952317 and i want to extract the list of words beneath the heading "MeSH Terms", how would I go about doing so?

I have something that can read the source but it is full of HTML tags and such...

Any help is much appreciated!

Jonathan Hedley
  • 10,442
  • 3
  • 36
  • 47
NSP
  • 487
  • 2
  • 8
  • 14
  • possible duplicate of [How to "scan" a website (or page) for info, and bring it into my program?](http://stackoverflow.com/questions/2835505/how-to-scan-a-website-or-page-for-info-and-bring-it-into-my-program) – BalusC Jun 16 '11 at 16:09

2 Answers2

3

As has been mentioned on here countless times before have a look at JSoup, which is a HTML parsing library for Java. Or write your own (not recommended).

planetjones
  • 12,469
  • 5
  • 50
  • 51
0

Probably TagSoup is for you.

Waldheinz
  • 10,399
  • 3
  • 31
  • 61