0

Is there a discipline, framework, or tool sets, for programming using information from html pages as part of the input data? Something like a meta search engine. how do you parse the webpage ?

I would prefer on java or flex/flash, or some pointers to some reading.

Thank you!

UPDATE February 7 2013

Thank you for your answers! web scraping was the term i was looking for!

Found this awesome java library: http://jsoup.org/ from this post Web scraping with Java.

Looking for the flex one, i´ll update as soon as i find it.

Ernest
  • 962
  • 3
  • 12
  • 28

1 Answers1

0

I think your question is a bit vague to get good answers, and I don't have Java/Flex experience myself, but most languages have library support for making an HTTP request to the resource in question (and, most likely, some sort of support for parsing the HTML/XML into a data structure of some sort that you can pull data from.)

Depending on what you're trying to get out of it, you may just be able to do simple string searches on the HTTP response for what you need. This is essentially what @pablochan is recommending when he suggests the wiki page on web scraping.

Be aware that some services/sites are designed to confound your attempts to page-scrape their data, and may indeed list such actions as a violation of their terms of service. If you are successful in doing so but do so too frequently, you may find your IP blocked or other sorts of actions taken to keep you from doing so.

Most static sites won't have protections like these, but large services may well.

abathur
  • 1,047
  • 7
  • 19