2

While MongoDB doesn't require any fixed schema, there are times we would like to migrate from one structure to another.

I was dealing with a small dataset (~200K) recently, and decided to loop existing data, transform data model and insert to new collections. It turned out our vps wasn't that powerful, using php driver I can only get to about ~300 insertion/sec, after having ensured following:

  • no index before insertion.
  • use batch insert as much as possible.

I wonder if I have simply picked the wrong migration path, or if there are some best practice when dealing with schema change in MongoDB?


After taking in some suggestions, I changed the write concern to 0 during migration, and this is what i have observed:

  • Insertions are still not as fast as expected, max at ~500 insertion/sec
  • After insertion completed, indexing go through very quickly, probably due to the fact the ensureIndex is fire-and-forget with w=0?
  • Remaining update took a while to start, probably due to the fact the indexing operations are blocking? Then it appear to ran at a varying speed (previously it was running consistently slower), again maybe indexing were taking place.
  • CPU and IO were fine. cpu mostly had about 90% idle, and IO wait were less than 10%.

Besides not using our PHP ORM for the migration, are there more possibility for optimization?

bitinn
  • 9,188
  • 10
  • 38
  • 64
  • Can you just change your schema using a .update() from the shell itself? Dump the contents of the collection into another collection and then perform the update query? – user602525 Jan 02 '14 at 18:42
  • What WriteConcern are you using? In a migration script like this you should be able to get away with "Unacknowledged" which should give you a pretty good performance boost if you're currently using something safer like "Replica Acknowledged". – skelly Jan 02 '14 at 18:57
  • Is it too difficult to make at least a small test with another driver? I'm not very familiar with php driver but I've done something similar using Python in a Mac Computer (a really good one anyway) and I've been able to update/insert 10.000 in 5 seconds, and I didn't have to optimize too much my scripts. 2000 a second seems a very good performance for a script that was written in a couple of hours. – Maximiliano Rios Jan 02 '14 at 20:38
  • The first step should be determining whether you're being held back by CPU utilization (PHP itself) or write throughput (the MongoDB server). You can watch the PHP process with something like `top`, and `mongostat` can be used to watch server activity and the write lock. Anecdotally, I can say that PHP is certainly capable of pushing 10k+ updates/second using an unacknowledged write concern (`w=0`). Using the default write concern (`w=1`), round trips between database ops could certainly be a limiting factor here. – jmikola Jan 02 '14 at 22:08
  • thx all for your suggestion, `w=0` was an oversight. Though after changing the options, it still appear to be slower than most would suggest. certainly not over 1k/sec, and our vps cpu and disk appear to be doing fine during the test. i wonder if this is simply an overhead in our php application (which use orm to inteface with mongodb). – bitinn Jan 03 '14 at 07:21
  • When you say "CPU and IO were fine" do you mean client-side, where your app is running? Or server-side, where mongo is running? – skelly Jan 06 '14 at 14:44

1 Answers1

1

Transmitting and serializing everything to and from the php client is probably adding a lot of overhead. Running migrations from the shell is going to be the fastest. Write them with an update or use a cursor with forEach to iterate and make calls to save.

See an example of using cursors MongoDB update multiple records of array (towards the bottom).

Be aware of snapshot issues with cursors. Probably want an idempotent update or use snapshot if the collection isn't sharded.

Community
  • 1
  • 1
Wes
  • 711
  • 7
  • 6