3

Trying to migrate counter tables. Reading every row from source table and writing the result to a new table. When writing milions of rows problem with js is running out of heap memory. Syncronous code is being used to make sure row is written before try to write another one. Pinning down the problem observing the heap seems that heap rises when using cassandra-driver to execute UPDATE query.

const query = `UPDATE keyspace.table_name
  SET counter1 = counter1 + ${x1}, counter2 = counter2 + ${x2}
  WHERE partition_key = ? AND clustering_key = ?`;

await cassandraDestination.execute(query, params, { prepare: true });

How to avoid memory leak in this situation?


==== JS stack trace =========================================

    0: ExitFrame [pc: 0x1317672cfc5d]
Security context: 0x192500f1d971 <JSObject>
    1: borrowNextConnection [0x2ea411f96091] [/home/asen.chekov/projects/stats/node_modules/cassandra-driver/lib/request-handler.js:~64] [pc=0x131767db1395](this=0x2ea411f95da1 <JSFunction RequestHandler (sfi = 0x2e8feb1ef231)>,0x21125933f379 <Object map = 0x1ddd59acbdb1>,0x21125933f439 <Object map = 0x1ddd59a82521>,0x110a33490991 <ProfileManager map...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

Writing Node.js report to file: report.20191016.122316.23174.001.json
Node.js report completed
 1: 0x953b10 node::Abort() [node]
 2: 0x9547f4 node::OnFatalError(char const*, char const*) [node]
 3: 0xb32bee v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb32e24 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xf32452  [node]
 6: 0xf32558 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [node]
 7: 0xf3ec78 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
 8: 0xf3f78b v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 9: 0xf424c1 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [node]
10: 0xf0c6f4 v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) [node]
11: 0x11c2b3e v8::internal::Runtime_AllocateInNewSpace(int, v8::internal::Object**, v8::internal::Isolate*) [node]
12: 0x1317672cfc5d 
Aborted (core dumped)```
high_breed
  • 61
  • 6

2 Answers2

2

This fixed the problem.

The correct query would be:

const query = `UPDATE keyspace.table_name
  SET counter1 = counter1 + ?, counter2 = counter2 + ?
  WHERE partition_key = ? AND clustering_key = ?`;

await cassandraDestination
  .execute(query, [ counter1, counter2, key1, key2 ], {
    prepare: true,
  });
high_breed
  • 61
  • 6
1

Note that the method signature for execute() method is:

execute(query: string, params: Array, options: object)

Also, you should not hardcode parameters in prepare statements.

The correct usage is:

const query = `UPDATE keyspace.table_name
  SET counter = counter + 1, counter1 = counter1 + 1
  WHERE partition_key = ? AND clustering_key = ?`;

await cassandraDestination.execute(query, [ key1, key2 ], { prepare: true });
jorgebg
  • 6,560
  • 1
  • 22
  • 31
  • Yes, the query was wrong, on the example but I've modified it to simplify the question. Trouble is about memory leak at that execute statement that causes the heap memory to raise constantly. – high_breed Oct 16 '19 at 12:01
  • 1
    Use query markers instead of hardcoded parameters, prepared statements will be cached. If you use hardcoded parameters you will cache one prepare statement per different parameter. – jorgebg Oct 17 '19 at 08:06
  • I'm using query markers in the original code, this is not the problem. – high_breed Oct 17 '19 at 08:44
  • Update the code samples and try to include the rest of the code, otherwise it would be very hard to understand how it could be happening – jorgebg Oct 17 '19 at 09:02
  • 2
    I've got it. I am hardcoding counter values in the query as well. You helped remotely by asking to check the query and I thought why not add the counter values to the params array as well. Now heap stays at around 55MB. Thanks anyway. – high_breed Oct 17 '19 at 09:21