Converting a larger byte array to a string

Question

When N is set to 125K the following works

let N = 125000
let x = [...Array(N)].map(( xx,i) => i)
let y = String.fromCodePoint(...x)
console.log(y.length)

When N is set to 128K that same code breaks:

Uncaught RangeError: Maximum call stack size exceeded

This is a common operation: what is the optimal way to achieve the conversion?

Note that I did look at this related Q&A . https://stackoverflow.com/a/3195961/1056563 We should not depend on node.js and also the approaches with the fromCharCode.apply are failing. Finally that answer is nearly ten years old.

So what is an up to date performant way to handle this conversion?

Interesting. Avoiding the `...` (in the call to `String.fromCodePoint`?) will probably get around that issue. That is, I suspect it's related to passing too many arguments: https://stackoverflow.com/a/22747272/2864740 — user2864740, Jul 21 '20 at 18:09
oh ! that's a good point! Feel free to make an answer. It's not really a _full_ answer but it points the right direction and I would upvote — WestCoastProjects, Jul 21 '20 at 18:15
It still won't work as fromCodePoint doesn't take an array :( Just trying to really isolate the observed issue. The larger question is still there: "How to [efficiently] create a string from a very large sequence of code points?" — user2864740, Jul 21 '20 at 18:15
Yes it will not - but you are correct to realize that the parameter passing will not work either. I am looking into alternatives — WestCoastProjects, Jul 21 '20 at 18:16
Does using ‘call’ instead of the ‘...’ change the behavior? That might be an implement detail, depending on how such code unifies internally. — user2864740, Jul 21 '20 at 20:13

user2864740 · Accepted Answer · 2020-07-21T22:36:31.180

The problem is caused because implementations have limits to the number of parameters accepted. This results in an exception being raised when too many parameters (over ~128k in this case) are supplied to the String.fromCodePoint functions via the spread operator.

One way to solve this problem relatively efficiently, albeit with slightly more code, is to batch the operation across multiple calls. Here is my proposed implementation, which fixes what I perceive as issues relating to scaling performance ~~and the handling of surrogate pairs~~ (that's incorrect: fromCodePoint doesn't care about surrogates, making it preferable to fromCharCode in such cases).

let N = 500 * 1000;
let A = [...Array(N)].map((x,i) => i); // start with "an array".

function codePointsToString(cps) {
  let rs = [];
  let batch = 32767; // Supported 'all' browsers
  for (let i = 0; i < cps.length; ){
    let e = i + batch;
    // Build batch section, defer to Array.join.
    rs.push(String.fromCodePoint.apply(null, cps.slice(i, e)));
    i = e;
  }
  return rs.join('');
}

var result = codePointsToString(A);
console.log(result.length);

Also, I wanted a trophy. The code above should run in O(n) time and minimize the amount of objects allocated. No guarantees on this being the 'best' approach. A benefit of the batching approach, and why the cost of apply (or spread invocation) is subsumed, is that there are significantly less calls to String.fromCodePoint and intermediate strings. YMMV - especially across environments.

Here is an online benchmark. All tests have access to, and use, the same generated "A" array of 500k elements.

What is the runtime of this? I rewrote with a pre-allocated array and was in the 20-30 milisecond range (as opposed to 19 *seconds* for other solutions that repetetively concatenated arrays). I see the `apply` in the loop so I'm honestly skeptical. `apply` is really slow — WestCoastProjects, Jul 21 '20 at 21:45

WestCoastProjects · Answer 2 · 2020-07-21T21:49:27.580

The given answers are of poor performance: i measured 19 seconds on one of them and the others are similar (*). It is necessary to preallocate the output array. The following is 20 to 40 milli seconds. Three orders of magnitude faster.

function wordArrayToByteArray(hash) {
    var result = [...Array(hash.sigBytes)].map(x => -1)
    let words = hash.words
        //map each word to an array of bytes
        .map(function (v) {
            // create an array of 4 bytes (less if sigBytes says we have run out)
            var bytes = [0, 0, 0, 0].slice(0, Math.min(4, hash.sigBytes))
                // grab that section of the 4 byte word
                .map(function (d, i) {
                    return (v >>> (8 * i)) % 256;
                })
                // flip that
                .reverse()
            ;
            // remove the bytes we've processed
            // from the bytes we need to process
            hash.sigBytes -= bytes.length;
            return bytes;
        })
    words.forEach((w,i) => {
        result.splice(i * 4, 4, ...w)
    })
    result = result.map(function (d) {
        return String.fromCharCode(d);
    }).join('')
    return result
}

(*) With the possible exception of @User2864740 - we are awaiting his numbers. But his solution also uses apply() inside the loop which leads to believe it will also be slow.

Which is why I provided them. The 19 seconds is one-for-one replacement with one of the other solutions. The key components of the slowdown are the repetitive application of `concat` and the usage of `apply`. Your solution avoids the first part but afaict not the second. I'd be interested to hear your numbers. — WestCoastProjects, Jul 21 '20 at 21:59
FWIW, this uses `fromCharCode`. That might have been an accidental change, as I did it too :( - Here are is an updated benchmark covering *just* the conversion process. https://jsben.ch/hutLb — user2864740, Jul 21 '20 at 22:32
Yours is probably of similar performance. In any case it has the awarded. — WestCoastProjects, Jul 21 '20 at 22:35

score 0 · Answer 3 · edited Apr 02 '21 at 14:01

0

"Old fashion" JavaScript:

var N=125000;
var y="";
for(var i=0; i<N; i++)
  y+=String.fromCharCode(i);
console.log(y.length);

Worked with N=1000000

edited Apr 02 '21 at 14:01

Dharman

30,962
25
85
135

answered Jul 21 '20 at 18:26

iAmOren

2,760
2
11
23

I expect this to have horrid performance characteristics without, perhaps, specific implementation optimizations. Conceptually this creates a new string each loop and copies over the original for O(n^2) bounds!! While it is possible that such can be optimized in a specific implementation, I would be hard pressed to suggest this approach in general. It may also not be correct for surrogate pairs. – user2864740 Jul 21 '20 at 20:05
“It’s not (just) you, it’s me.” - I’m also skeptical of the other answer. – user2864740 Jul 21 '20 at 20:17

Converting a larger byte array to a string

3 Answers3