0

I have an ArrayBuffer with content that needs to be converted to a hex-string, that should be sent in a JSON message. The data can be up to 2 GB. For this I would like to have a string that works as a dynamic array, so I efficiently could push_back individual characters. In C++, we have std::string (or std::vector), I think Java has a StringBuilder. What is the way of doing it in JavaScript?

I tried += but that starts to take minutes after reaching 128 MB. I guess this is because it results in an O(n^2) algorithm.

Note: I also need to do the inverse operation.

user877329
  • 6,717
  • 8
  • 46
  • 88
  • You could build an array (each index being a single two-character string) and then `.join()` it, but 2 gigabytes (well, 8 once it's turned into a hex string, which really means 10 because the buffer is still there) is going to be pretty taxing on the machine. – Pointy Jul 09 '23 at 13:24
  • 1
    Seems like a streaming solution would be a much better idea. – Pointy Jul 09 '23 at 13:24
  • 2
    `+=` normally is optimised by JS engines into a constant average operation, but I guess this might break down when working with humongous strings. As Pointy suggested, do not build the message as a single string value but rather write it into a stream. – Bergi Jul 09 '23 at 13:39
  • Related: *[JavaScript does not need a StringBuilder](https://josephmate.github.io/java/javascript/stringbuilder/2020/07/27/javascript-does-not-need-stringbuilder.html)*, *[Does JavaScript have a built-in stringbuilder class?](https://stackoverflow.com/questions/2087522/does-javascript-have-a-built-in-stringbuilder-class)*, and *[How can I build/concatenate strings in JavaScript?](https://stackoverflow.com/questions/31845895/how-to-build-concatenate-strings-in-javascript)* – Peter Mortensen Jul 21 '23 at 08:43
  • @PeterMortensen The answer to that question is *probably*. However to make a robust interface that does not expose problems with variable byte length may be difficult. But, having an efficient StringBuilder causes people to think about time complexity, which by itself is good for developing programming skills. – user877329 Jul 23 '23 at 14:04

2 Answers2

0

Well, why not use a temporary Uint8Array, like so:

function to_hex_string(bytes)
{
    let tmp = new Uint8Array(2*bytes.length);
    for(let k = 0; k != bytes.length; ++k)
    {
        let value = bytes[k];
        let msb = value >> 4;
        let lsb = value & 0xf;
        tmp[2*k] = msb < 10 ? msb + 48 : (msb - 10) + 65;
        tmp[2*k + 1] = lsb < 10 ? lsb + 48 : (lsb - 10) + 65;
    }

    return new TextDecoder().decode(tmp);
}

This outperforms joining an array. Though it fails when trying to convert the array to a string when trying 512 MiB. I guess there is some hardcoded limit. Maybe a JS process may not use more than 4 GiB because of a 32-bit VM.

user877329
  • 6,717
  • 8
  • 46
  • 88
-3

To efficiently convert a large ArrayBuffer into a hexadecimal string representation in JavaScript , you can use the Uint8Array view to access the individual bytes of the buffer and then build the hex string dynamically. Instead of using string concatenation (+=), which can be slow for large strings due to JavaScript's immutable string nature, you can use an array to efficiently push individual characters and then join them into a string.

function arrayBufferToHexString(buffer) {
  const bytes = new Uint8Array(buffer);
  const hexChars = [];

  for (let i = 0; i < bytes.length; i++) {
    const hex = bytes[i].toString(16).padStart(2 , '0');
    hexChars.push(hex);
  }

  return hexChars.join('');
}

In this code, we iterate over each byte of the ArrayBuffer using a for loop. We convert each byte to its hexadecimal representation using toString(16). The padStart(2, '0') ensures that each byte is represented by two characters, even if the value is less than 16 (e.g. , '0F' instead of just 'F'). We push each hex value into the hexChars array.

Finally , we use join('') to concatenate all the hex characters into a single string and return it.

To perform the inverse operation and convert the hexadecimal string back to an ArrayBuffer ,

function hexStringToArrayBuffer(hexString) {
  const bytes = [];

  for (let i = 0; i < hexString.length; i += 2) {
    const hexByte = hexString.substr(i , 2);
    const byte = parseInt(hexByte , 16);
    bytes.push(byte);
  }

  return new Uint8Array(bytes).buffer;
}

In this code , we iterate over the input hex string in steps of 2 to extract each byte's hexadecimal representation. We convert each hex byte to its decimal value using parseInt(hexByte, 16), and push it into the bytes array.

Finally , we create a new Uint8Array from the bytes array and obtain its underlying buffer using .buffer. This buffer represents the reconstructed ArrayBuffer.

These functions should handle large ArrayBuffers efficiently , as they avoid the performance issues associated with string concatenation for large strings.

  • My computer is broken, I can't use pycharm now and check my answer, but I believe my answer could run well – gales thibeault Jul 09 '23 at 13:28
  • This approach is quite similar to how you would to it in Python, though in Python it is the string that joins the array and not the array that is joined by a string. – user877329 Jul 09 '23 at 14:15
  • OK, hope it could help you – gales thibeault Jul 09 '23 at 14:17
  • Interestingly. I get out-of-memory exception already at 512 MiB, but my system has 32 GiB. But it works better that += – user877329 Jul 09 '23 at 14:25
  • Maybe other processes/programs are taking up memory, when I begin to learn Machine Learning, I always saw this error message... – gales thibeault Jul 09 '23 at 14:30
  • 1
    I think array elements take up some space so instead of two bytes (or four for UTF-16) per element in bytes, it will occupy at least 8 on a 64 bit machine. This will never work. – user877329 Jul 09 '23 at 15:26
  • There's absolutely nothing efficient about converting a large buffer from raw 8-bit values into **32 bit** hex strings (2 characters, 16 bits per character). The correct thing for the OP to do is stream the raw buffer to the server, and let the server inflate it however it wants. A 2GB buffer will be 8GB once converted, and then it will take twice as long to transmit as the raw buffer. – Pointy Jul 09 '23 at 17:58
  • 3
    Possible ChatGPT answer, see other answers by this user – Jan Doggen Jul 09 '23 at 19:13
  • 1
    Welcome to Stack Overflow, gales thibeault! All four of your answers here so far appear likely to be entirely or partially written by AI (e.g., ChatGPT). Please be aware that [posting AI-generated content is not allowed here](//meta.stackoverflow.com/q/421831). If you used an AI tool to assist with any answer, I would encourage you to delete it. We do hope you'll stick around and be a valuable part of our community by posting *your own* quality content. Thanks! – NotTheDr01ds Jul 21 '23 at 02:02
  • 2
    **Readers should review this answer carefully and critically, as AI-generated information often contains fundamental errors and misinformation.** If you observe quality issues and/or have reason to believe that this answer was generated by AI, please leave feedback accordingly. – NotTheDr01ds Jul 21 '23 at 02:02