I'd like to get the main image for an article, much like Facebook does when you post a link (but without the choosing image part). The data we have to work with is the whole pages HTML as a variable. The page & URL will be different for every time this function runs.
Are there any libraries or classes that are particularly good at getting the main body of content, much like Instapaper that would be of any help?