How to get information from a webpage in Java?

Question

Does anyone know of a quick way that I can get information from a webpage in Java? For instance, if I'm looking at a page like this: http://www.ncbi.nlm.nih.gov/pubmed/?term=10952317 and i want to extract the list of words beneath the heading "MeSH Terms", how would I go about doing so?

I have something that can read the source but it is full of HTML tags and such...

Any help is much appreciated!

possible duplicate of [How to "scan" a website (or page) for info, and bring it into my program?](http://stackoverflow.com/questions/2835505/how-to-scan-a-website-or-page-for-info-and-bring-it-into-my-program) — BalusC, Jun 16 '11 at 16:09

score 3 · Accepted Answer · answered Jun 16 '11 at 15:57

3

As has been mentioned on here countless times before have a look at JSoup, which is a HTML parsing library for Java. Or write your own (not recommended).

answered Jun 16 '11 at 15:57

planetjones

12,469
5
50
51

score 0 · Answer 2 · answered Jun 16 '11 at 15:56

0

Probably TagSoup is for you.

answered Jun 16 '11 at 15:56

Waldheinz

10,399
3
31
61

How to get information from a webpage in Java?

2 Answers2