-1

i want to update some data in mongodb,my logic is as follows:

#find the specific document with "md5,time,size",
if collection.find({"src_md5":file_md5,"src_time":file_time,"src_size":file_size}).count() == 0:
    #if not found
    #   find the idx,if idx is not yet exist,set idx equa 1

    if collection.find({},{"idx":1}).count() == 0:
        idx = 1

    #if idx is alread there, sort idx and get the biggest idx
    else:
        idx = collection.find({},{"idx":1}).sort('idx',-1).limit(5)[0]['idx']
        idx = idx + 1

        #insert the info with idx
        if not self.insertFileInfo(collection,file_obj,file_md5,file_time,file_size,long(idx)):
            return None
#if the specific document with "md5,time,size" is found
else:
#just get the idx with the specific md5
    idx = collection.find({"src_md5":file_md5,"src_time":file_time,"src_size":file_size},{"idx":1})[0]['idx']
    return None

i will run the above code in 4 machines,which means 4 process would update mongodb almost simultaneously,how can i ensure the atomic of the operations? my recored schema is

{"src_md5":"djapijfdakfiwqjfkasdj","src_size":2376498,"src_time":1338179291,"idx":1}
{"src_md5":"jdfipajkoijjipjefjidwpj","src_size":234876323,"src_time":1338123873,"idx":2}
{"src_md5":"djapojfkdasxkjipkjkf","src_size":3829874,"src_time":1338127634,"idx":3}

it's not a simple auto increment key,it should be increased when md5,size,time changed,and shuld be insert with them,as a record. i create a compound unique index on {"src_md5","src_time","src_size"},and create a unique index on {"idx"},but before i insert new info, i should get the idx alread exist,then increase it. there are two situation: 1,idx with the specific md5,size,time,if is already exist,just return the idx 2,if not exist, increase idx with 1

Mez
  • 24,430
  • 14
  • 71
  • 93
user1420895
  • 57
  • 2
  • 5

1 Answers1

2

A similar problem is discussed in this question.

What you want to do is similar to having a unique monotonically increasing key which you would keep in its own collection and increment using $inc as described in the linked question.

This will assure that you never try to use the same idx twice. Now there is still a possibility that two threads will try to insert a new combination of (md5,size,time) with two different idx keys but the second one will fail because of the unique index you have on (md5,size,time).

The only race condition that exists here now is when the second thread fails to insert due to unique index, you will end up with an unused idx value (i.e. every time that happens, increasing idx values will skip one). How big of a problem it is for you? If big you would have to either enforce locking in your application code, or you could change the structure of your schema to deal with this case.

Community
  • 1
  • 1
Asya Kamsky
  • 41,784
  • 5
  • 109
  • 133