3

I'd like to insert millions of records as a batch process from MongoDB to Aerospike. I follow the documentation and found this doc: http://www.aerospike.com/docs/client/nodejs/usage/kvs/write.html but with this only one record can we inserted at a time.

Please help me how I can perform insert over millions of key values at a single time. Any suggestion to optimize write operation

Vikalp
  • 71
  • 1
  • 10

2 Answers2

4

Every record write into Aerospike will be a single record write since Aerospike has a record level lock. Don't see how you can write a million records in one operation. Records for a given namespace are distributed evenly across the Aerospike cluster based on a hash of their set name and record key. So writes to the Aerospike cluster from the client side have to be individual record writes.

pgupta
  • 5,130
  • 11
  • 8
  • For deeper understanding on the parallelism on server side, read Ronen's link below in his answer on Aerospike internals. – pgupta Aug 29 '17 at 16:08
4

Aerospike is a multi-node, multi-core, multithreaded distributed key-value database. If you want to do a large number of write operations in as short amount of time possible, you need to leverage this fact and do your writes in parallel. As Piyush pointed out, each object in itself is written as a single write, so you should be splitting your work across multiple clients and multiple threads in those clients. This is how tools such as aerospike/aerospike-loader and asrestore work.

I've described how it works inside each node in a separate post about Aerospike internals.

Ronen Botzer
  • 6,951
  • 22
  • 41
  • Thanks @ronen, I am using only one client connection and when I am trying to loop over put request I got below error: { AerospikeError: Max node/event loop BB993AA892C9B0E async connections would be exceeded: 300 message: 'Max node/event loop BB993AA892C9B0E async connections would be exceeded: 300', code: -7, func: 'as_event_get_connection', file: 'src/main/aerospike/as_event.c', line: 447 } – Vikalp Aug 31 '17 at 12:04
  • That 300 number is the max number of concurrent async operations. It's controlled by setting the [Config.maxConnsPerNode](http://www.aerospike.com/apidocs/nodejs/Config.html) of the Client. If you're only using one client connection you'll be throttling at the client-side, not the server-side. Async from one thread is still limited compared to the number of cores and threads the cluster can use to execute those write operations. – Ronen Botzer Aug 31 '17 at 16:33