Working with memory to fetch string yields incorrect result

Question

I am following the solutions from here: How can I return a JavaScript string from a WebAssembly function and here: How to return a string (or similar) from Rust in WebAssembly?

However, when reading from memory I am not getting the desired results.

AssemblyScript file, helloWorldModule.ts:

export function getMessageLocation(): string {
    return "Hello World";
 }

index.html:

 <script>
    fetch("helloWorldModule.wasm").then(response =>
    response.arrayBuffer()
   ).then(bytes =>
      WebAssembly.instantiate(bytes, {imports: {}})
    ).then(results => { 
        var linearMemory = results.instance.exports.memory;
        var offset = results.instance.exports.getMessageLocation();
        var stringBuffer = new Uint8Array(linearMemory.buffer, offset, 11);

        let str = '';
        for (let i=0; i<stringBuffer.length; i++) {
            str += String.fromCharCode(stringBuffer[i]);
        }
    debugger;
    });
  </script>

This returns an offset of 32. And finally yields a string that starts too early and has spaces between each letter of "Hello World":

However, if I change the array to an Int16Array, and add 8 to the offset (which was 32), to make an offset of 40. Like so:

  <script>
    fetch("helloWorldModule.wasm").then(response =>
      response.arrayBuffer()
    ).then(bytes =>
      WebAssembly.instantiate(bytes, {imports: {}})
    ).then(results => { 
        var linearMemory = results.instance.exports.memory;
        var offset = results.instance.exports.getMessageLocation();
        var stringBuffer = new Int16Array(linearMemory.buffer, offset+8, 11);

        let str = '';
        for (let i=0; i<stringBuffer.length; i++) {
            str += String.fromCharCode(stringBuffer[i]);
        }
        debugger;
    });
  </script>

Then we get the correct result:

Why does the first set of code not work like its supposed to in the links I provided? Why do I need to change it to work with Int16Array to get rid of the space between "H" and "e" for example? Why do I need to add 8 bytes to the offset?

In summary, what on earth is going on here?

Edit: Another clue, is if I use a TextDecoder on the UInt8 array, decoding as UTF-16 looks more correct than decoding as UTF-8:

It looks like you found the answer to your question. You should consider adding what you discovered as a self answer. — Clint, Oct 17 '18 at 19:14
I will do so once I figure out why using a 16 bit array means need to add 8 to the offset returned by the function — k29, Oct 17 '18 at 19:20

k29 · Accepted Answer · 2018-11-22T14:55:02.410

0

AssemblyScript uses utf-16: https://github.com/AssemblyScript/assemblyscript/issues/43

Additionally AssemblyScript stores the length of the string in the first 32 or 64 bits.

That's why my code behaves differently. The examples in the links at the top of this post were for C++ and Rust, which do string encoding differently

edited Nov 22 '18 at 14:55

answered Oct 18 '18 at 11:04

k29

641
6
26

Working with memory to fetch string yields incorrect result

1 Answers1