3

I have a json file and the structure of that file is as follows:

{
"orders":[
 {
  "id": 876876876,
  "app_id":580714,
  "client_details": {},
  "discount_codes": [{}],
  "line_items": [
        {
          "id": 466157049,
          ...
        }
   ],
   ...... 
 },
 {
   "id": 47844583,
   "app_id":580714,
   "client_details": {},
   "discount_codes": [{}],
   "line_items": [
        {
          "id": 466157049,
           ...
        }],
     ....
 },
 {...},
 {...},
 {...}
 ]
}

This array can contains more than 10lakhs (1 million) objects. Currently I need to:

  • find the object with order id
  • Total number of orders
  • get orders with order id and with the number of limit

I am using the following code:

 return new Promise((resolve, reject) => {
        var orders = []
        var getStream = function () {
            var stream = fs.createReadStream(file_path, { encoding: 'utf8' }),
                parser = JSONStream.parse('*');
            return stream.pipe(parser);
        };
    
        getStream()
        .pipe(es.mapSync(function (data) {
            
            orders = data
        })) .on('end', function() {
            
            resolve(orders)

        })
})

But it makes the system hang. Also, I have used the following command as well:

 node --max-old-space-size=8192 index.js

But that also does not worked. Can anyone please help me with processing such big json file.

Edited: Now filesize is 850MB and I am using the following code:

return new Promise((resolve, reject) => {
  var data = ''
        var reader_stream = fs.createReadStream(file_path) 
        reader_stream.setEncoding('UTF8')

        reader_stream.on('data', function(chunk) {
            data += chunk
        })

        reader_stream.on('end',function() {
            try{
                const orders_result = JSON.parse(data)
                var order_count     = (orders_result.orders)

                resolve({
                    "count": order_count.length
                })
            } catch(err) {
                console.log(err)
            }
        })

        reader_stream.on('error', function(err) {
            console.log(err.stack)
            reject(err.stack)
        })
})

and getting the following error

Uncaught Exception: RangeError: Invalid string length

Deep Kakkar
  • 5,831
  • 4
  • 39
  • 75
  • 3
    Storing it in a databse is probably a good idea when you have such a large amount of data. From which you can query what you need. – Sandsten Aug 26 '21 at 06:42
  • I know @Sandsten but DB is not the options here. – Deep Kakkar Aug 26 '21 at 06:44
  • What is "not working" ? – Thibaud Aug 26 '21 at 06:44
  • @DeepKakkar - 10GB JSON needs >10GB ram for just your node process - I take it you're node process is 64bit in a 64bit OS, right? – Bravo Aug 26 '21 at 06:45
  • `DB is not the options` - I think you'll find it's the ONLY options that will actually work with any useful speed - even if you could store 10GB Object in your process, access speed would be glacial compared to what a database can do for you – Bravo Aug 26 '21 at 06:48
  • ohh @Bravo , I am only having 8GB RAM in my system. let me decrease the data and get back to you – Deep Kakkar Aug 26 '21 at 06:48
  • only have 8GB, and you `--max-old-space-size=8192`? you know, other processes need memory too!! Decrease to 1GB - it'll still probably be so slow as to be pointless – Bravo Aug 26 '21 at 06:49
  • 1
    @DeepKakkar, I think you are looking for this [question](https://stackoverflow.com/q/11874096/14032355) and probably duplicated there. – ikhvjs Aug 26 '21 at 07:10

1 Answers1

0

JSON.parse needs to read the whole file into memory, including the parts that your application does not need. One approach would be to use a SAX-like parser like clarinet. These parsers don't read the whole file into memory, they generate events during the parsing process. You need to handle these events to check whether the data is of interest and only store the information that you actually need.

This will reduce the amount of memory required for the parsing process, but it is not as convenient. Your operation sounds like you don't need all the data, so maybe you are lucky and a stripped down version can fit into memory.

Marcus Riemer
  • 7,244
  • 8
  • 51
  • 76
  • I have edited my question, Now filesize is just 850 MB but getting error as __RangeError: Invalid string length – Deep Kakkar Aug 26 '21 at 06:58
  • 1
    @DeepKakkar This answer should still address the problem. Strings have size limits, so you are going to have to either use a real database, split your data into smaller files (sort them into multiple folders?), or use a parser, like the one in this answer. – theusaf Aug 26 '21 at 07:02
  • Without knowing your exact setup, a single 850MB string still may be more than your application can handle. This also depends on your version of node.js, see the answer at (and especially the comment): https://stackoverflow.com/a/47781288/431715 – Marcus Riemer Aug 26 '21 at 07:02
  • I am using v14.15.0 version of Node js – Deep Kakkar Aug 26 '21 at 07:03
  • I am not getting example code to use so that I can get object for processing. e.g. var stream = require("clarinet").createStream(options); what is options there? – Deep Kakkar Aug 26 '21 at 08:13
  • https://github.com/dscape/clarinet#arguments If you have issues with clarinet itself, please accept this answer and ask a new question. – Marcus Riemer Aug 26 '21 at 08:52