2

Why does the following code produce 2 different values for the array buffers? How would the text representation of a blob need to be encoded/processed to allow conversion to and from a blob via it's text representation?

  const input = new Uint16Array([512, 591, 26, 123, 158, 0, 189, 123, 5]);
  const originalBlob: Blob = new Blob([input]);
  const blobFromText = new Blob([await originalBlob.text()]);

  console.log(await originalBlob.arrayBuffer());
  console.log(await blobFromText.arrayBuffer());

Output

Original:
Int16Array(9)  [512,591,26,123,158,0,189,123,5]
Copied:
Int16Array(11) [512,591,26,123,-16401,189,0,-16401,189,123,5]

NodeJS Source indicates a ucs2 encoding being used which should be close to utf-16.

https://github.com/nodejs/node/blob/7919ced0c97e9a5b17e6042e0b57bc911d23583d/lib/internal/blob.js#L216-L218 https://github.com/nodejs/node/blob/7919ced0c97e9a5b17e6042e0b57bc911d23583d/lib/internal/encoding.js#L426

I am asking this question in conjunction with Decode XMLHTTPResponseText into dataUrl without base encoding on server side


Edit: Throwing in a text encoder to convert the text to utf-8 does not alleviate the problem.

const originalBlob: Blob = new Blob([new Uint16Array([512, 591, 26, 123, 158, 0, 189, 123, 5])]);
const originalBlobAsText = await originalBlob.text();

const te = new TextEncoder();
const blobFromText = new Blob([te.encode(originalBlobAsText)]);
Kilian
  • 1,540
  • 16
  • 28
  • The [Node.js documentation](https://nodejs.org/api/all.html#buffer_class_blob) doesn't say how it treats strings in the `Blob` constructor, but I think we can reasonably assume they mean to align with the [web standard `Blob`](https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob), which handles strings in the `Blob` construtor as UTF8, not UCS2 or UTF16. So I suspect where the code is going wrong is `new Blob([await originalBlob.text()])`. – T.J. Crowder May 15 '21 at 12:01
  • @T.J.Crowder I already attempted to use the TextEncoder (utf-8) and any combination of encoding/decoding with utf-8 and 16 and simply not figure out how this can all be mangled together. – Kilian May 15 '21 at 12:25

0 Answers0