1

I am generating a client-side HTML redirect like this:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Déjà vu - Wikipedia</title>
  <script type='text/javascript'>
  document.addEventListener('DOMContentLoaded', function () {
var newHTML = document.createElement('html');
var newHead = document.createElement('head');
var newMeta = document.createElement('meta');
var newTitle = document.createElement('title');
newTitle.text = "Déjà vu - Wikipedia";
newMeta.httpEquiv = "refresh";
newMeta.charset = "utf-8";
newMeta.content = "30;url=https://en.wikipedia.org/wiki/D%C3%A9j%C3%A0_vu";
var newBody = document.createElement('body');
var newPar = document.createElement('p');
var newText = document.createTextNode('Loading Déjà vu - Wikipedia...');
newPar.appendChild(newText);
newBody.appendChild(newPar);
newHead.appendChild(newMeta);
newHead.appendChild(newTitle);
newHTML.append(newHead);
newHTML.append(newBody);
var tempAnchor = window.document.createElement('a');
HTMLBlob = new Blob([newHTML.outerHTML], {type: 'text/html; charset=UTF-8'});
tempAnchor.href = window.URL.createObjectURL(HTMLBlob);
tempAnchor.download = "example-redirect.html"
tempAnchor.style.display = 'none';
document.body.appendChild(tempAnchor);
tempAnchor.click();
document.body.removeChild(tempAnchor);

  });
  </script>
  </head>
  <body>
  </body>
</html>

However, I am losing the charset meta attribute when I do so. The output looks like this:

<html><head><meta http-equiv="refresh" content="30;url=https://en.wikipedia.org/wiki/D%C3%A9j%C3%A0_vu"><title>Déjà vu - Wikipedia</title></head><body><p>Loading Déjà vu - Wikipedia...</p></body></html>

This means that my browser is not sure what encoding to use, and does not display the accents correctly.

Loading Déjà vu - Wikipedia...

This, on the other hand, properly shows the accents:

<html><head><meta http-equiv="refresh" charset="utf-8" content="30;url=https://en.wikipedia.org/wiki/D%C3%A9j%C3%A0_vu"><title>Déjà vu - Wikipedia</title></head><body><p>Loading Déjà vu - Wikipedia...</p></body></html>

Loading Déjà vu - Wikipedia...

I've reduced it down as minimal example as I can, and it still occurs.

<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="utf-8">
  <title>title</title>
  <script type='text/javascript'>
    document.addEventListener('DOMContentLoaded', function() {
      var newHTML = document.createElement('html');
      var newHead = document.createElement('head');
      var newMeta = document.createElement('meta');
      newMeta.charset = "utf-8";
      newHead.appendChild(newMeta);
      newHTML.append(newHead);
      var tempAnchor = window.document.createElement('a');
      HTMLBlob = new Blob([newHTML.outerHTML], {
        type: 'text/html; charset=UTF-8'
      });
      tempAnchor.href = window.URL.createObjectURL(HTMLBlob);
      tempAnchor.download = "minimal-output.html"
      tempAnchor.style.display = 'none';
      document.body.appendChild(tempAnchor);
      tempAnchor.click();
      document.body.removeChild(tempAnchor);

    });
  </script>
</head>

<body>
</body>

</html>

Here is the output:

<html><head><meta></head></html>

This occurs in both Firefox 63.0 and Chromium 70.0. Here is a link to the Git repo:

https://github.com/nbeaver/stackoverflow_question_2018-11-07

How can I preserve the charset attribute of an HTML blob?

2 Answers2

1

HTML <meta> elements currently don't have a dedicated DOM interface for setting the charset attribute. See the specification: https://www.w3.org/TR/html5/document-metadata.html#the-meta-element.

newMeta.charset = "utf-8"; only adds your own arbitrary charset property to the newMeta JavaScript object. This arbitrary property has no effect on the charset HTML attribute of the <meta> element.

You need to set the charset attribute like this: newMeta.setAttribute("charset", "utf-8");

Petr Srníček
  • 2,296
  • 10
  • 22
0

According to this answer Set charset meta tag with JavaScript

You can't set the charset content attribute by setting the charset property because they don't reflect each other. In fact there is no property that reflects the charset content attribute. [...] The character set is established by the parser, so constructing the meta element in JavaScript after the HTML has been parsed will have no effect on the character set of the document at all.

However, in your case, prepending an UTF-8 BOM header to the blob might do the trick.

HTMLBlob = new Blob(["\ufeff",newHTML.outerHTML], {type: 'text/html; charset=UTF-8'});
Eray Balkanli
  • 7,752
  • 11
  • 48
  • 82
Dan D.
  • 815
  • 9
  • 16