24

I have JSON file which has json data of size 914MB. I loading the file with fs-extra and parsing it. But when i parse it get error as

cannot create a string longer than 0x1fffffe8 characters

Below is code

        const fs = require('fs-extra');
        const rawdata = fs.readFileSync('src/raw.json');
        const data = JSON.parse(rawdata);

I am running the project with npm and to run i have below command in package.json.

"scripts": {
   
    "start:dev": "cross-env NODE_OPTIONS='--max-old-space-size=4096' ts-node -r tsconfig-paths/register ./src --env=development",
  
  }
TechChain
  • 8,404
  • 29
  • 103
  • 228
  • 2
    Your server process doesn't have enough memory. Operating systems impose limits on resources single process can consume. There are usually ways to instruct the OS that your process should be given more memory, but without knowing anything else about what you're doing it's impossible to provide more information. – Pointy Jul 02 '21 at 18:48
  • @Pointy That error message sounds more like a hard-coded limitation in the implementation, not something coming from the OS. It's probably related to the way the length of strings is represented. – Barmar Jul 02 '21 at 18:51
  • @Barmar if you google the error message, various different lengths come up. I suspect you're basically right, but it probably depends on the OS in various ways. *edit* and clearly the most important question is why you'd have a 900 megabyte JSON file. – Pointy Jul 02 '21 at 18:54
  • @Pointy Maybe they use JSON to backup their database. :) – Barmar Jul 02 '21 at 19:00
  • @Pointy Well i need to test my code with heavy data .that's why – TechChain Jul 02 '21 at 19:58
  • 2
    I've worked with very large databases and processing a 900MB JSON file is never something I've had to do. For one thing, JSON is a *really* inefficient storage format. – Pointy Jul 02 '21 at 20:17
  • @Pointy I just need to some purpose really. can you help? – TechChain Jul 04 '21 at 17:55
  • 2
    [See this Node changelist.](https://github.com/v8/v8/commit/ea56bf5513d0cbd2a35a9035c5c2996272b8b728). Node has a maximum string length of about 512MB, and that cannot be changed. It is part of the Node architecture. – Pointy Jul 04 '21 at 19:13
  • 2
    Try using a streaming JSON parser instead. – Bergi Jul 04 '21 at 19:19
  • 1
    Yes, a streaming parser would be good, though I would not be surprised if another memory limit might get involved once the code starts assembling the data structure itself. Again, knowing the application details might allow people to provide suggestions for an *architectural* change. – Pointy Jul 04 '21 at 19:28
  • "It is part of the Node architecture.", it's actually a v8 on 64bits systems limit, [Chrome is also concerned.](https://stackoverflow.com/questions/61271613/chrome-filereader-api-event-target-result). You'd need a stream-parser – Kaiido Jul 06 '21 at 02:33
  • @TechChain Can you provide a follow up? This question has been getting a lot of visibility, and you would help the community by sharing your solution and why it worked for you. If mine worked, you ought to mark it as `accepted`. If nothing has worked you ought to provide more details about what you tried. – Inigo Feb 08 '22 at 19:03

2 Answers2

25

0x1fffffe8 is exactly 512MB.

The many commenters are correct: you are bumping up against a system limit. I agree with @Pointy that it is mostly likely a Node string length limit. fs-extra has nothing to do with the limit

In any case, you're going to have to process that JSON in chunks. Below are different ways to do this.

A: Use a SAX-style JSON parser

You have many parser options. To get you started, here are a couple I found on NPM:

  • BFJ has a walk function that does SAX-style parsing. BFJ is archived, but still has millions of weekly downloads.

  • stream-json

B: Implement a Node Streams pipeline

Almost certainly your massive JSON data is a array at the root level. This approach uses a parser that can asynchronously process each element in that array individually, or in batches, whichever makes sense. It is based on the very powerful and flexible Node Streams API.

ℹ️ If your data isn't an JSON array, but a stream of concatenated JSON objects, then it probably conforms to the JSON Streaming protocol. See option D below.

  • JSONStream lets you filter by path or pattern in its streaming parse. It is archived, but still has millions of weekly downloads

  • BFJ - in addition to supporting SAX-style walk function mentioned above, it does selective object level streaming:

    match returns a readable, object-mode stream and asynchronously parses individual matching items from an input JSON stream.

  • stream-json has a Pick pipeline operator that can pick desired items out of a stream, ignoring the rest. Many other options.

  • jsonparse

C: Manual chunking

This will likely be the most efficient if your data supports it.

This option is like B, except instead of employing a streaming parser, you do the chunking yourself. This is easy to do if the elements of the JSON data array are very regular, e.g. each element occupies exactly N lines. You can easily extract them without parsing.

For example, if your data looked like this:

{
  data: [
    { name: ...,
      address: ... },
    { name: ...,
      address: ... },
    { name: ...,
      address: ... },
    { name: ...,
      address: ... }
  ]
}

Your process would be something like this:

  1. Use a buffered reader to read the file. (DO NOT synchronously read it all into memory)
  2. Discard the first two lines
  3. Read the file in chunks, two lines at a time
  4. If a chunk starts with {, remove any trailing comma and parse each individual {name:..., address:...} record.
  5. If it doesn't, you have reached the end of the array. Discard the rest of the file or hand it off to some other process if you expect some other data there.

The details will depend on your data.

D: Use a JSON Streaming protocol parser

The JSON Streaming protocol is a stream of multiple JSON objects concatenated in a stream. If that's what you have, you should use a parser that supports this protocol.

Inigo
  • 12,186
  • 5
  • 41
  • 70
  • I have tried most of the options but they don't work. – TechChain Jul 09 '21 at 04:08
  • 1
    What do you mean they don't work? Please explain in detail. I can't help you with a comment like that. – Inigo Jul 09 '21 at 07:58
  • 1
    Your case is a very common programming problem. It's not special at all. You are definiutely not doing something right. Are you STILL trying to load it into memory as ONE string? If you are, then you aren't understanding what we've all said, nor do you understand my solution for you, and you will keep getting the same error no matter what you try. You need to PROCESS it in chunks, NOT load it into a `string` using an alternate JSON parser. – Inigo Jul 09 '21 at 08:00
  • 4
    Tried "most" of them? `stream-json` linked in this answer works with huge JSON files. – Zed Jul 09 '21 at 23:55
  • stream-json works. The api is confusing though – Kira Apr 11 '23 at 02:58
  • 0x1fffffe8 bytes is neither 512 MB nor 512 MiB. It’s some bytes shy of 512 MiB. – Константин Ван May 25 '23 at 12:18
  • And also, it’s about the length of a `string`, which is counted in UTF-16 units, not the byte count. – Константин Ван May 25 '23 at 12:25
1
import NodeBuffer from "node:buffer";

NodeBuffer.constants.MAX_STRING_LENGTH

Represents the largest length that a string primitive can have, counted in UTF-16 code units.

This value may depend on the JS engine that is being used.

The limitation imposed on a string is not about the byte count. It’s 0x1FFFFFE8 (2²⁹-24) UTF-16 units in my Node.js 20.1.0.