101

Summary

I am looking for the criteria by which I can create a webpage and be [fairly] sure it will appear in the Firefox Reader View, if user desired.

Some sites have this option, some do not. Some with more text do not have this option than others with much less text. Stack Overflow for instance displays only the question rather than any answers in Reader View.

Question

I have had my Firefox upgraded from 38.0.1 to 38.0.5 and have found a new feature called ReaderView - which is a sort of overlay which removes "page clutter" and makes text easier to read. Readerview is found in the right hand side of the address bar as a clickable icon on certain pages.

This is fine, but from the programming point of view I want to know how "reader view" works, which criteria of which pages it applies to. I have done some exploration of the Mozilla Firefox website with no clear answers (sod all programming answers of any sort I found), I have of course Googled / Binged this and this only came back with references to Firefox addons - this is not an addon but a staple part of the new Firefox version.

I made an assumption that readerview used HTML5 and would extract <article> contents but this is not the case as it works on Wikipedia which does not appear to use <article> or similar HTML5 tags, instead the readview extracts certain <div>s and displays them alone. This feature works on some HTML5 pages - such as wikipedia - but then not others.

If anyone has any ideas how Firefox ReaderView actually operates and how this operation can be used by website developers, can you share? Or if you can find where this information can be located, can you point me in the right direction - as I have not been able to find this.

Martin
  • 22,212
  • 11
  • 70
  • 132
  • 9
    The source of the library used by Firefox Reader View is on GitHub at https://github.com/mozilla/readability if that helps... – Richard Neish Jun 05 '15 at 15:28
  • thanks @RichardNeish - taking a look at it, it's not clear, it's a stripped down `
    ` and/or `
    ` and/or `

    ` and a few other tags. I'll need to read over it when I'm fresh tomorrow.. . .

    – Martin Jun 05 '15 at 17:43
  • Could you write up your findings as an answer? I would be interested to hear how it works. – Richard Neish Jun 05 '15 at 19:17
  • 3
    FYI @RichardNeish , Reading through the gitHub code, this morning, the process is that page elements are listed in a likelyhood order - with `
    `,`

    `,`

    `,`
    ` at the top of the list (ie most likely) and then each of these "nodes" is given a score based on things such as comma counts and class names that apply to the node. The score value decides if the HTML page can be "page viewed" in Firefox. I am not absolutely clear if the score value is set by Firefox or by the readability function. Javascript is really not my strong point, so someone else should check over this.
    – Martin Jun 06 '15 at 12:38
  • 1
    @Martin I think you should consider posting that as an answer (and then not accept it, if you think someone else can do better than you). – svick Jun 06 '15 at 21:40
  • cheers @svick I have done just that. Also cheers for adding the reader-view flag, I wasn't aware the flag existed already! – Martin Jun 06 '15 at 22:44
  • @Martin It didn't, I just created it. (Though I'm not completely sure that was a good call, it may be too unimportant to have its own tag.) – svick Jun 06 '15 at 22:54
  • @svick hah, well it's already got two questions on SO - and somewhat of a lack of developer documentation on Mozilla (I looked yesterday) . – Martin Jun 06 '15 at 22:55
  • 4
    On [webmasters.se]: [How do I make my site compatible with Firefox's Reader View feature](http://webmasters.stackexchange.com/q/83058/17633) – unor Jul 11 '15 at 21:07
  • hahaha @unor like the link swapping there :D – Martin Jul 11 '15 at 22:38

3 Answers3

75

You need at least one <p> tag around the text, that you want to see in Reader View, and at least 516 characters in 7 words inside the text.

for example this will trigger the ReaderView:

<body>
<p>
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789 123456
</p>
</body>

See my example at https://stackoverflow.com/a/30750212/1069083

Kadam Parikh
  • 422
  • 4
  • 17
rubo77
  • 19,527
  • 31
  • 134
  • 226
  • Thanks for the info, I have an issue of pages which have multiple `

    ` tags but each tag is under the min character count, although 3 tags makes up to 1455 characters. But nice to know the specification numbers I need to work to to make the Reader View possible on a page. I also note that images in `

    ` tags within the outer `

    ` tags are retained in Reader View . Cheers for your help

    – Martin Jun 10 '15 at 21:04
  • It would also be interesting how the chrome reader view on android is triggered? – rubo77 Nov 27 '16 at 17:06
  • How did you figure this out? It's awesome but seems a bit too much like magic :) – icc97 Jan 14 '23 at 15:49
  • I found out by try-and-error – rubo77 Jan 19 '23 at 10:15
43

Reading through the gitHub code, this morning, the process is that page elements are listed in a likelyhood order - with <section>,<p>,<div>,<article> at the top of the list (ie most likely).

Then each of these "nodes" is given a score based on things such as comma counts and class names that apply to the node. This is a somewhat multi-faceted process where scores are added for text chunks but also scores are seemingly reduced for invalid parts or syntax. Scores in sub-parts of "node" are reflected in the score of the node as a whole. ie the parent element contains the scores of all lower elements, I think.

This score value decides if the HTML page can be "page viewed" in Firefox.

I am not absolutely clear if the score value is set by Firefox or by the readability function.

Javascript is really not my strong point,and I think someone else should check over the link provided by Richard ( https://github.com/mozilla/readability ) and see if they can provide a more thorough answer.

What I did not see but expected to see was score based on amount of text content in a <p> or a <div> (or other) relevant tags.

Any improvements on this question or answer, please share!!

EDIT: Images in <div> or <figure> tags (HTML5) within the <p> element appear to be retained in the Reader View when the page text content is valid.

Martin
  • 22,212
  • 11
  • 70
  • 132
39

I followed Martin's link to the Readability.js GitHub repository, and had a look at the source code. Here's what I make of it.

The algorithm works with paragraph tags. First of all, it tries to identify parts of the page which are definitely not content - like forms and so on - and removes them. Then it goes through the paragraph nodes on the page and assigns a score based on content-richness: it gives them points for things like number of commas, length of content, etc. Notice that a paragraph with fewer than 25 characters is immediately discarded.

Scores then "bubble up" the DOM tree: each paragraph will add part of it's score to all of it's parent nodes - a direct parent gets the full score added to its total, a grandparent only half, a great-grandparent a third and so on. This allows the algorithm to identify higher-level elements which are likely to be the main content section.

Though this is just Firefox's algorithm, my guess is if it works well for Firefox, it'll work well for other browsers too.

In order for these Reader View algorithms to work for your website, you want them to correctly identify the content-heavy sections of your page. This means you want the more content-heavy nodes on your page to get high scores in the algorithm.

So here are some rules of thumb to improve the quality of the page in the eyes of these algorithms:

  1. Use paragraph tags in your content! Many people tend to overlook them in favor of <br /> tags. While it may look similar, many content-related algorithms (not only Reader View ones) rely heavily on them.
  2. Use HTML5 semantic elements in your markup, like <article>, <nav>, <section>, <aside>. Even though they're not the only criterion (as you noted in the question), these are very useful to computers reading your page (not just Reader View) to distinguish different sections of your content. Readability.js uses them to guess which nodes are likely or unlikely to contain important content.
  3. Wrap your main content in one container, like an <article> or <div> element. This will receive score points from all the paragraph tags inside it, and be identified as the main content section.
  4. Keep your DOM tree shallow in content-dense areas. If you have a lot of elements breaking your content up, you're only making life harder for the algorithm: there won't be a single element that stands out as being parent of a lot of content-heavy paragraphs, but many separate ones with low scores.
Martin
  • 22,212
  • 11
  • 70
  • 132
Sean Bone
  • 3,368
  • 7
  • 31
  • 47
  • 3
    I originally wrote an article on my own site about this, figured I'd contribute here instead of just plugging it. – Sean Bone Nov 22 '16 at 17:06
  • 1
    Thanks for your answer. Could you add a date (and a link?) when you wrote this on your site, as the details you've posted here are much more complex than rubo77's or my answers, so I would expect the algorithm has been made more complex with each release of Firefox. – Martin Nov 22 '16 at 17:17
  • 3
    @Martin It was written in November 2016 - here's the link: http://weblog.zumguy.com/read.php?tid=56 – Sean Bone Nov 24 '16 at 20:49
  • 11
    Interestingly enough, this is the answer that appears when I enable Reader View on my firefox. – Chris Jaquez Jan 31 '17 at 03:25
  • 4
    Note -- the article is now under [http://zumguy.com/enabling-reading-mode-on-your-website/](http://zumguy.com/enabling-reading-mode-on-your-website/) – Sean Bone Jan 10 '20 at 09:45