Best way to choose an image from HTML to use as a thumbnail/preview of the page

Question

Assuming there is no og:image or link rel img_source, does anyone have any real-world experience or advice on better-than-random techniques to choose an image that best represents a web page?

Update: All answers are good, so upvoted them all and selected one, although it seems there is no great way of doing this. I will experiment with largest picture and screenshot of what it would like on a low-res client. Thanks all!

PS: I'm finding that quite a few pages seem to have og:image or link rel img_source anyway. More than I expected

Do you want the ["Google instant preview"-style](http://www.google.com/landing/instantpreviews/#a) image or another image? — Kiril, Jul 10 '12 at 15:25
Instant preview (i.e. screenshot) type thing is OK as a backup, but I'm creating fairly small thumbnails (about 100 pixels wide) so a photo would work better. — Jamona Mican, Jul 11 '12 at 15:28
I would go for the largest image on the page as it's probably going to be thge most important/relevant on the page. Might not be true in every page but should hold true a good deal of the time. You could use javascript/jquery to get the width/height of each image. — Billy Moat, Jul 17 '12 at 09:35

Dom · Accepted Answer · 2013-06-26T20:14:45.843

2

Taking a screenshot of the website in its smallest possible form, how it would look on a notebook laptop or even a mobile (but not the mobile site version), would be a non-random approach.

Most good web designers will try to make sure users are able to see what the page is about immediately upon loading and include the most important and relevant information 'above the fold' as they say.

edited Jun 26 '13 at 20:14

answered Jul 18 '12 at 22:45

Dom

2,275
3
24
34

score 1 · Answer 2 · edited May 23 '17 at 12:12

1

Choose the logo of your page as the og:image. That way your brand becomes associated with all your posts, without having to worry about what image best defines each individual page.

For other pages, you cannot control what image they have.

You could investigate how sharer.php works but other than that there is no silver bullet as to choose which image for a web page that has no definable image.

edited May 23 '17 at 12:12

Community

1
1

answered Jul 10 '12 at 05:31

phwd

19,975
5
50
78

I did not mean for my own pages, but for analyzing other people's pages. Sorry if that wasn't clear. – Jamona Mican Jul 10 '12 at 09:14
Hey pwhd. The facebook algorithm seems to be quite good, but I have not been able to find any resources that detail how it works :( – Jamona Mican Jul 11 '12 at 15:30

score 1 · Answer 3 · answered Jul 18 '12 at 08:41

I don't have any experience with Facebook opengraph, but one trick I've used before is to grab favicons of sites I've linked to and use them as link button icons... They're small and are usually always associated with the company name and/or logo, and they're pretty universal across most professional websites. And the usually univeral filename favicon.ico makes it really easy to pick out of the html (or the link attributes if they change the filename).

Might give that a shot if that could be adapted into what you are trying to do. If you find that doesn't look too good, you can try a more "web 2.0" take and check for iPhone/iPad button apple-touch-icon png images (probably only find them on big name sites though)

http://developer.apple.com/library/ios/#DOCUMENTATION/AppleApplications/Reference/SafariWebContent/ConfiguringWebApplications/ConfiguringWebApplications.html

score 1 · Answer 4 · answered Jul 18 '12 at 09:50

I would normally suggest that you simply scrape the page of img tags. However, these days, CSS background images are frequently applied to h1/header/div/a/etc. tags to display logos in place of text.

One possible solution is to grab all elements with 'logo' in their ID/class name:

var l = $('[id*="logo"],[id*="Logo"],[class*="logo"],[class*="Logo"]');

If this is/contains an img tag, chances are you have the site's logo. Otherwise, if it's a div or other such container, you'll need to dig into the child elements' CSS properties to see if they have a background image.

From this you can build a set of candidate images, which when combined with a heuristic based on (for example) image dimensions, should hopefully spit out a logo every time.

I hope this helps you on your way!

score 1 · Answer 5 · answered Jul 19 '12 at 13:13

Going for the logo is usually the wrong way. Seeing it from a user of your website's point of view I would rather want no image than a logo all the time. This is the same as in Google+ or Facebook links. Only show images when it really does make sense.

However finding the corresponding image may not always be trivial if there is no og:image or rel="image_source" provided.

An article usually has a title which is presented as <h1> or <h2> tag. The nearest image may be the right one. However the nearest may also be a logo so this can go wrong.

I would do that very pragmatic. I would fetch the most likely image first and read the given EXIF data, if this is a real image there are informations provided. If this is just a logo, spacer or some other kind of layout graphic it does not have EXIF data and therefore is not relevant. If the first picture is not the right tough, I would fetch the next one and so on.

Another clue may be the HTML5 <article> tag which usually has the corresponding image to the post nested.

Nevertheless there are several web designer not using standards and their webpage may not be parable nicely.

Best way to choose an image from HTML to use as a thumbnail/preview of the page

5 Answers5