3

I'm using Google's puppeteer to read HTML, make some changes to it, and save it to a new HTML file.

Almost everything is working properly, except puppeteer is escaping double-quote characters (") as " inside the style attribute.

For example:

style='font-size:11.0pt;font-family:"Arial",sans-serif; color:#D99594'

becomes:

style="font-size:11.0pt;font-family:"Arial",sans-serif; color:#D99594"

This is affecting not only the output HTML, but some of the processing I'm doing within Puppeteer.

I believe I've ruled out encoding as an issue. Any ideas or fixes?

Thanks!

Seth Wilson
  • 241
  • 1
  • 4
  • 13
  • When is Puppeteer escaping quotes? In which function? – hardkoded Mar 12 '19 at 12:19
  • This behaviour seems valid, see, for example, https://stackoverflow.com/questions/3752769/how-to-escape-double-quotes-in-title-attribute/3752794 – vsemozhebuty Mar 12 '19 at 14:49
  • Also, If you add `style='font-family:"Arial",sans-serif;` to an element in the DevTools DOM tree tab and then copy outerHTML via menu, you wil get `"Arial"` as well. – vsemozhebuty Mar 12 '19 at 14:55

1 Answers1

1

Problem

Functions like page.content() or similar functions that return HTML, will give you the current HTML representation of the DOM. However, this DOM representation of your HTML code might differ from your given HTML code. Therefore, this is expected behavior.

To name some examples:

  • Chrome will make <div/> into <div></div>.
  • Chrome will use double quotes for attributes: <div id='a'></div> becomes <div id="a"></div>
  • Chrome will make attributes lower case: <div ID="a"></div> becomes <div id="a"></div>
  • Chrome will try to fix your code: <div><span></div></span> becomes <div><span></span></div>

Try it yourself

To test it yourself you can use the following code. It will put some code into the DOM and then use innerHTML to check what the DOM actually looks like. Click on Run code snippet at the bottom and enter any code you want to test:

const el = document.querySelector("#domTester");
const output = document.querySelector('#output');

function showResult() {
  const outerElement = document.createElement('div');
  outerElement.innerHTML = el.value;
  output.value = outerElement.innerHTML;
}
el.addEventListener('input', showResult);
showResult();
<p>
  What you give to the browser:<br />
  <input id="domTester" type="text" value="<div id='a &quot; b'/>" style="width:100%" />
</p>
<p>
  What the DOM will be rendered as:<br />
  <input id="output" type="text" readonly="readonly" style="width:100%" />
</p>
Thomas Dondorf
  • 23,416
  • 6
  • 84
  • 105