3

I currently have a 700M file and always end up with a Memory Limit when I try to read it (purpose: import data to FireStore using firestore nodejs sdk).

I tried the following libraries:


  return fs.createReadStream(file)
    .pipe(parser())
    .pipe(streamArray())
    .on('data', async (row) => {
    //   delete row.key;
      if(row.value && typeof row.value === 'object') {
        ++totalSetCount;

      }
    })
    .on('end', async () => {
      // Final Batch commit and completion message.
      // await batchCommit(false);
      console.log(args.dryRun
        ? 'Dry-Run complete, Firestore was not updated.'
        : 'Import success, Firestore updated!'
      );
      console.log(`Total documents written: ${totalSetCount}`);
    });
}

Here is my error:

<--- Last few GCs --->

[63298:0x102682000]    66318 ms: Mark-sweep 1365.8 (1441.3) -> 1353.1 (1441.8) MB, 470.6 / 0.0 ms  (average mu = 0.212, current mu = 0.069) allocation failure scavenge might not succeed
[63298:0x102682000]    66796 ms: Mark-sweep 1366.4 (1442.3) -> 1352.1 (1443.3) MB, 446.4 / 0.0 ms  (average mu = 0.152, current mu = 0.065) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0xd54cf6dbe3d]
Security context: 0x364a2419e6e1 <JSObject>
    1: exec [0x364a24189231](this=0x364a321029a1 <JSRegExp <String[50]: [^\"\\]{1,256}|\\[bfnrt\"\\\/]|\\u[\da-fA-F]{4}|\">>,0x364aa7402201 <Very long string[65536]>)
    2: _processInput [0x364a32102a09] [/Users/mac-clement/Documents/projets/dpas/gcp/import-data/json-import/node_modules/stream-json/Parser.js:~107] [pc=0xd54cf9bb37b](this=0x364ac032ea19 <Tran...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x10003b125 node::Abort() [/usr/local/bin/node]
 2: 0x10003b32f node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0x1001a8e85 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0x1005742a2 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/bin/node]
 5: 0x100576d75 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [/usr/local/bin/node]
 6: 0x100572c1f v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/usr/local/bin/node]
 7: 0x100570df4 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
 8: 0x10057d68c v8::internal::Heap::AllocateRawWithLigthRetry(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 9: 0x10057d70f v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/usr/local/bin/node]
10: 0x10054d054 v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) [/usr/local/bin/node]
11: 0x1007d4f24 v8::internal::Runtime_AllocateInNewSpace(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/bin/node]
12: 0xd54cf6dbe3d
[1]    63298 abort      firestore-migrator i /Users/mac-clement/Downloads/wetransfer-ff44eb/5000.json

If you have any advice, I'd appreciate it.

Clement Montois
  • 184
  • 2
  • 12
  • **1st**: you are not doing anything `async` inside `on data` and `on end` listeners so they `async` keyword is not needed unless u are. **2nd**: How are you delegating/storing incoming data. – ambianBeing Aug 14 '19 at 13:47
  • I deleted my async code to debug what was causing memory limit, I don't want to store incoming data I just want to read it then put it directly in FireStore and then treat next row (no need to store the previous one) – Clement Montois Aug 14 '19 at 14:16
  • Okay! so each chunk `row` at `on("data", (row)=>{})` should be a document that you want to store in database isn't it. You can collect that and trigger save, say every 500-1000 objects. – ambianBeing Aug 14 '19 at 15:31

2 Answers2

1

You probably should use SAX strategy and read the file piece by piece. The DOM strategy means you decoding entire JSON file into the tree structure. When you using SAX strategy, you having an event to get each separated value and it's key it to do anything with it.

Wohlstand
  • 161
  • 1
  • 9
0

Looks like adding a return null; to your on data event handler would fix it. Your library is likely accumulating unresolved promises.

bknights
  • 14,408
  • 2
  • 18
  • 31