1

I am trying to import 10 billion records. Started tested with importing 1 billion records. Import time getting worst as the records gets inserted. Here are configurations and stats.

Mongo db version - 3.4
Documents - 1226592923
Routers(m4.xlarge) 2 
Config 3
Nodes(i3.large,15GB nvme ssd)  Import time(hrs)
5                              14:30:00
10                             8:10:00

Each Document has around 7 fields. Shard key is on 3 fields. Followed all the recommendations at https://docs.mongodb.com/v3.4/reference/ulimit/#recommended-ulimit-settings.

Import options

--writeConcern '{ w: 0, j: false }'
--numInsertionWorkers 8

Even tried disabling journal(--nojournal), but no much difference.

Not sure if this is the expected import time. Or is the way I can do anything else to improve ingestion rate?

  • 1
    Is the collection indexed? What format is the data, and how was it created (mongoexport / mongodump)? – robjwilkins Dec 11 '17 at 15:55
  • 1
    you may get some specific/better help over at https://dba.stackexchange.com/ – user3788685 Dec 11 '17 at 16:00
  • Did you set up the splits on the shards before inserting the 10b docs? Nice blog on the performance pik-up here: https://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-importing/ – Buzz Moschetti Dec 11 '17 at 16:19
  • @robjwilkins I tried both with and without indexing, but no much difference. Data is is json format, generated from spark – karthik kalletla Dec 11 '17 at 17:19
  • I agree with @BuzzMoschetti, you should definitly pre-split de the collection before starting your insertions. If you don't, the balancer will do it afterwards and it will take forever :/ – Henri-Maxime Ducoulombier Dec 11 '17 at 20:14

1 Answers1

0

Here are some of the factors made a lot of improvement in importing

  1. Pre-splitting
  2. Sorting data
  3. Disabling the balancer sh.stopBalancer()
  4. Turning off auto split during load(sh.disableAutoSplit() or restart the mongos without --noAutoSplit)
  5. Indexing after loading complete data

References:

  1. https://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-importing/
  2. https://stackoverflow.com/a/19672303