2

I have got 2 mongo databases. 1. Staging, 2. Production. In the staging we have around 5 collections of seed data, on which we run some batch jobs and populate few more say 3 collections. The 8 collection becomes seed data for the production which has user information + this seed data.

Is there any better patterns on managing the data push to the staging and from the staging to production. Right now we are trying to mongoexport all the collections and tar.gz it and archive it on network drive on each stage and mongoimport it.

Its very painful and taking long to export,import and archive which on gzipping is around 1.5 GB. Is there any good patterns to solve this problem?

vinothkr
  • 1,270
  • 12
  • 23
  • well, you may not need the file in the middle. piping mongoexport to mongoimport is a pretty fast way of doing things. See http://stackoverflow.com/questions/10624964/whats-the-fastest-way-to-copy-a-collection-within-the-same-database/10627056#10627056 – AD7six Oct 26 '12 at 11:42
  • I would like to archive them. For example, If the batch job had errors and i would like to rerun them on the same seed data. Or even revert back to older run data – vinothkr Oct 26 '12 at 11:45
  • then do both steps at once e.g. use [tee](http://en.wikipedia.org/wiki/Tee_(command)). there's no need to do things in serial – AD7six Oct 26 '12 at 11:51
  • Btw, what are "CD pipelines"? – Asya Kamsky Oct 30 '12 at 05:09
  • Its continuous delivery pipelines. Like the one in thoughtworks go. – vinothkr Oct 30 '12 at 11:48

1 Answers1

1

'mongoimport' and 'mongoexport' is meant to be used with data from outside systems - all data is translated into plain json and then back again into bson.

If you use 'mongodump' and 'mongorestore' you should see much better performance as both deal with bson directly which is more compact to store and does not require two translations (once to json and once from json).

Asya Kamsky
  • 41,784
  • 5
  • 109
  • 133