Screen-scraping, also known as web-scraping or data-scraping, is a software technique used to collect and parse information from user interfaces. If your question is specifically about scraping from websites or web-APIs, please use the [web-scraping] tag instead.
Screen-scraping, also known as web-scraping or data-scraping, is a software technique used to collect and parse information from websites. The information is scraped via a parser, for example using regular expressions or, in the case of a 3270 emulator, variants of HLLAPI
.
Questions that have this tag should be directly related to gathering information from websites through the use of a parsing mechanism such as regular expressions or browser emulators such as PhantomJS. (Questions about screen-scraping using regular expressions should also be tagged regex.)
Because information on web pages is almost certainly organized in well-formatted html, basic screen-scraping can be a simple task. In most cases, the reason for screen-scraping is to not only parse the data on the web page, but then to collect it either by reproducing it on a different web page or storing in a database.
One of the most common causes of problems in web-scraping is that the web page as seen in a browser (using DOM inspection tools) may be very different from the HTML retrieved by the web-scraping tool from the same URL. For example, there may be Javascript code that augments or modifies the contents of the page when loaded in a browser.
It is important to note that screen-scraping of websites may be against the website's individual Terms of Use, but the enforceability of these terms is unclear. Note that most major website hosts can detect ongoing screen-scraping, and can take action as if it were a Denial-of-service attack.
Historically, screen-scraping also described the technique of "scraping" data off of or on to a 3270 emulator. This technique gained some popularity shortly after the advent of such emulators. The API 3270 emulators implemented was known as HLLAPI (High Level Language Application Programming Interface), later EHLLAPI (Enhanced HLLAPI) and WinHLLAPI came into existence. Application programs would "drive" the emulator, sending simulated keystrokes and function keys, then waiting for responses.