0

I'm using MongoDB over the command line to go loop through a bunch of documents for a particular condition, move from one collection to another collection and removing from the original collection.

db.coll1.find({'status' : 'DELETED'}).forEach(
    function(e) {db.deleted.insert(e);  db.coll1.remove({_id:e._id});  });

This works however I need to script this so it moves all the documents in coll1 to the deleted collection everyday (or every hour) via a cron script. I'm using PHP so I figured I would write a script in use the Mongo PHP Library ::

$db->execute('db.coll1.find({'status' :'DELETED'}).forEach(
    function(e) {  db.deleted.insert(e); db.coll1.remove({_id:e._id});  })');

This works but unlike the Mongo command line, db->execute() is evaled, which causes a lock until the execution block is finished, which holds off all writes to the collection. I can't do that in my production environment.

Is there a way (without manually logging into Mongo and running the command) and executing it via a PHP script without locking?

If I use:

db->selectCollection('coll1')->find(array('status' => 'DELETED')) 

and iterate through that I can select the documents, save to the deleted collection and delete from the coll1 collection. However this seems like a lot of bandwidth to pull everything on the client and to save it back to the server.

Any suggestions?

pb2q
  • 58,613
  • 19
  • 146
  • 147
marko.vujo
  • 101
  • 1
  • 6

1 Answers1

2

Is there a way (without manually logging into Mongo and running the command) and executing it via a PHP script without locking?

As you stated the best thing is to do it client side. As for the bandwidth, unless you got a pre-90's network then it will most likely be a very small amount of bandwidth in comparison to how much you would use for everything else including replica sets etc.

What you could do is warehouse your deletes upon their actual deletion (in your app) instead of once every day and then you would, once a day, go back through your original collection removing all deleted rows. That way the bandwidth will be spread throughout the day and when it comes to clean your production you just do a single delete command.

Another alternative would be to use an MR and make its output be that collection.

Though in general warehousing deletes in this manner is normally more work than it is worth. It is normally better to just keep them in your main collection and work your queries around the deleted flag (as you probably already do to not warehouse these immediately).

Sammaye
  • 43,242
  • 7
  • 104
  • 146