6

https://github.com/mozilla/readability (readability.js is for creating a read view for web pages)

How can I implement readability.js to this test Webpage The problem is, readability.js deletes the elements of this website, that I want to keep and leaves those that should be removed. I hope someone can help me. Thank you! Is there any documentation on how to use readability.js?

<html><head>
<title>Reader View shows only the browser in reader view</title>
    <script src="https://raw.githack.com/mozilla/readability/master/Readability.js"></script>
</head>
<body>
Everything outside the main div tag vanishes in Reader View<br>
<img class="no-print" src="http://dummyimage.com/1024x100/000/ffffff&text=This+banner+should+vanish+in+print+view">
<div>
   <h1>H1 tags outside ot a p tag are hidden in reader view</h1>
   <img class="no-print" src="http://dummyimage.com/1024x100/000/ffffff&text=This+banner+is resized+in+print+view">
   <p>
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789 123456
</p>
</div>
</body>
    <script>
    var article = new Readability(document).parse();
    </script>
</html>

source of the Test page: Optimize website to show reader view in Firefox

j08691
  • 204,283
  • 31
  • 260
  • 272
Marcel
  • 100
  • 1
  • 7

3 Answers3

6

You can use DOMPurify and Readability together like they've mentioned in their docs -

import { Readability } from '@mozilla/readability'
import DOMPurify from 'dompurify';

function readable(doc) {
  const reader = new Readability(doc)
  const article = reader.parse()
  return article
}

let cloneDoc = document.cloneNode(true)
let parsed = readable(cloneDoc)
const markup = DOMPurify.sanitize(parsed.content)

markup will be an html string of the readable content. Try console.log(parsed) to see the available properties.

akkhil
  • 359
  • 4
  • 14
3

Did you try this?

From their github page:

"Readability's parse() works by modifying the DOM. This removes some elements in the web page. You could avoid this by passing the clone of the document object while creating a Readability object."

var documentClone = document.cloneNode(true); 
var article = new Readability(documentClone).parse();

You can make a copy of the dom object so that you're not actually modifying the real dom

bze12
  • 727
  • 8
  • 20
0

Okay....

    document.getElementById("body").innerHTML = "<font face='Calibri' size='4'> 
    <h1>"+article.title+"</h1>"+article.content;
Marcel
  • 100
  • 1
  • 7