1

I'm building a NodeJs App using Express 4 + Sequelize + a Postgresql database. I'm using Node v8.11.3.

I wrote a script to load data into my database from a JSON file. I tested the script with a sample of ~30 entities to seed. It works perfectly.

Actually, I have around 100 000 entities to load, in the complete JSON file. My script reads the JSON file and tries to populate the database asynchronously (ie. 100 000 entities at the same time).

The result is, after some minutes :

<--- Last few GCs --->

[10488:0000018619050A20]   134711 ms: Mark-sweep 1391.6 (1599.7) -> 1391.6 (1599.7) MB, 1082.3 / 0.0 ms  allocation failure GC in old space requested
[10488:0000018619050A20]   136039 ms: Mark-sweep 1391.6 (1599.7) -> 1391.5 (1543.7) MB, 1326.9 / 0.0 ms  last resort GC in old space requested
[10488:0000018619050A20]   137351 ms: Mark-sweep 1391.5 (1543.7) -> 1391.5 (1520.2) MB, 1311.5 / 0.0 ms  last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0000034170025879 <JSObject>
    1: split(this=00000165BEC5DB99 <Very long string[1636]>)
    2: attachExtraTrace [D:\Code\backend-lymo\node_modules\bluebird\js\release\debuggability.js:~775] [pc=0000021115C5728E](this=0000003CA90FF711 <CapturedTrace map = 0000033AD0FE9FB1>,error=000001D3EC5EFD59 <Error map = 00000275F61BA071>)
    3: _attachExtraTrace(aka longStackTracesAttachExtraTrace) [D:\Code\backend-lymo\node_module...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: node_module_register
 2: v8::internal::FatalProcessOutOfMemory
 3: v8::internal::FatalProcessOutOfMemory
 4: v8::internal::Factory::NewFixedArray
 5: v8::internal::HashTable<v8::internal::SeededNumberDictionary,v8::internal::SeededNumberDictionaryShape>::IsKey
 6: v8::internal::HashTable<v8::internal::SeededNumberDictionary,v8::internal::SeededNumberDictionaryShape>::IsKey
 7: v8::internal::StringTable::LookupString
 8: v8::internal::StringTable::LookupString
 9: v8::internal::RegExpImpl::Exec
10: v8::internal::interpreter::BytecodeArrayRandomIterator::UpdateOffsetFromIndex
11: 0000021115A043C1

Finally, some entities have been created but the process clearly crashed. I understood that this error is due to memory.

My questions is : Why Node doesn't take the time to manage everything without overshooting memory ? Is there a "queue" to limit such explosions ?

I identified some workarounds :

  • Segment the seed into several JSON files
  • Use more memory using --max_old_space_size=8192 option
  • Proceed sequentially (using sync calls)

but none of these solutions are satisfying to me. It makes me afraid for the future of my app supposed to manage sometimes long operations in production.

What do you think about it ?

Benjamin D.
  • 400
  • 1
  • 5
  • 15
  • If you want help, you will probably have to show us your code. If you're launching 100,000 database operations at once in a loop, that would be too many. We can only advise you specifically if you show us YOUR specific code. – jfriend00 Aug 01 '18 at 16:45
  • Node does what you tell it. If you instruct it to do something that take a gigantic amount of memory, that's what it attempts to do. Chances are that it is YOUR code that needs to be fixed. Welcome to the demands of programming and writing good code. – jfriend00 Aug 01 '18 at 16:46
  • @jfriend00 Thanks for your answer. Actually I don't want you to debug my code, that's why I didn't post it. My point was to know how NodeJs deals with numerous async computation/IO. You answered my question : "Node does what you tell it. If you instruct it to do something that take a gigantic amount of memory, that's what it attempts to do." So I know that I have to restrict the async calls to a limit. – Benjamin D. Aug 02 '18 at 07:29

2 Answers2

2

Node.js just does what you tell it. If you go into some big loop and start up a lot of database operations, then that's exactly what node.js attempts to do. If you start so many operations that you consume too many resources (memory, database resources, files, whatever), then you will run into trouble. Node.js does not manage that for you. It has to be your code that manages how many operations you keep in flight at the same time.

On the other hand, node.js is particularly good at having a bunch of asynchronous operations in flight at the same time and you will generally get better end-to-end performance if you do code it to have more than one operation going at a time. How many you want to have in flight at the same time depends entirely upon the specific code and exactly what the asynchronous operation is doing. If it's a database operation, then it will likely depend upon the database and how many simultaneous requests it does best with.

Here are some references that give you ideas for ways to control how many operations are going at once, including some code examples:

Make several requests to an API that can only handle 20 request a minute

Promise.all consumes all my RAM

Javascript - how to control how many promises access network in parallel

Fire off 1,000,000 requests 100 at a time

Nodejs: Async request with a list of URL

Loop through an api get request with variable URL

Choose proper async method for batch processing for max requests/sec

If you showed your code, we could advise more specifically which technique might fit best for your situation.

jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • 1
    Thank you, I was more looking for such an answer actually, not a code review. Since the day I asked, I suceeded creating my entries in my DB by limitating the in flight requests at a time. It works perfectly. Thank you for explanation and the links – Benjamin D. Aug 09 '18 at 09:18
1

Use async.eachOfLimit to do at max X operations in same times :

var async = require("async");

var myBigArray = [];
var X = 10; // 10 operations in same time at max

async.eachOfLimit(myBigArray, X, function(element, index, callback){

    // insert element
    MyCollection.insert(element, function(err){
       return callback(err);
    });

}, function(err, result){

    // all finished
    if(err){
       // do stg
    }
    else
    {
       // do stg
     }

});
Daphoque
  • 4,421
  • 1
  • 20
  • 31
  • Thanks for this answer @Daphoque. It does not meet my code but you can't know it because I didn't post it. I will do such a "queue" artificially to restrict the memory usage at a T time. – Benjamin D. Aug 02 '18 at 07:34