When sharing a link in major websites like Digg and Facebook; it will create thumbnails by capturing main images of the page. How they catch images from a webpage? Does it included loading the whole page (e.g. by cURL) and parsing it (e.g. with preg_match) ? To me, this method is slow and unreliable. Does they have a more practical method?
P.S. I think there should be a practical method for quick crawling the page by skipping some parts (e.g. CSS and JS) to reach src attributes. Any idea?