3

i'm using mongodb and have some trouble with the speed. My collections got bigger and now contains about 7.000.000 items. As a result, the findAndModify query takes about 3 seconds. I have a index on the queried field (in my case "links", which is an array). Does anybody see a big failure or inefficient code (see below).

public Cluster findAndLockWithUpsert(String url, String lockid) {
   Query query = Query.query(Criteria.where("links").in(Arrays.asList(url)));
   Update update = new Update().push("lock", lockid).push("links", url);
   FindAndModifyOptions options = new FindAndModifyOptions();
   options.remove(false);
   options.returnNew(true);
   options.upsert(true);
   Cluster result = mongo.findAndModify(query, update, options, Cluster.class, COLLECTION);

   return result;
}

thank you in advance!

Kai Schlegel
  • 183
  • 2
  • 11
  • Have you tried to run the query being created in the console? Just to see whether it's the query taking so long or the update. Also try to run the entire command via the console and see how the numbers differ. How many objects do you expect to get back? – Oliver Drotbohm Apr 19 '13 at 11:41
  • the query alone is processed very quick, because it uses the index. i only expect one result item. is mongodb really so slow, when a item from a huge collection is updated? maybe because of the write-lock? – Kai Schlegel Apr 19 '13 at 13:25
  • Do you see the slow findAndModify operation in your log file? It should log if over 100ms (unless you have changes slowms settings in profiling). As an alternative, can you enable profiling for slow operations and provide the system.profiles record generated? This will give us information on whether an index was efficiently used, the # of records modified and the time spent waiting on locks. See the following for more in profiling: http://docs.mongodb.org/manual/tutorial/manage-the-database-profiler/ – James Wahlin Apr 19 '13 at 13:33
  • Growing objects can be expensive. Since you are pushing to an embedded array, MongoDB might have to move the object. What is the `paddingFactor` of the collection? Could you post `db.collection.stats`? How long does the update operation take if you perform it using an `_id` query instead? (just to confirm it's the update that is slow). – mnemosyn Apr 19 '13 at 13:35
  • the log file says: "{ findandmodify: "cluster", query: { links: { $in: [ "http://www.bbc.co.uk/music/artists/c9dd033c-0270-463c-b3ec-f4b7712486fe#artist" ] } }, update: { $push: { lock: "3494c56a-ff92-4dc5-be2d-1cf9e96f5ab1" } }, new: true } update: { $push: { lock: "3494c56a-ff92-4dc5-be2d-1cf9e96f5ab1" } } ntoreturn:1 nscanned:1 nmoved:1 nupdated:1 keyUpdates:0 locks(micros) w:256809 reslen:2665 256ms" – Kai Schlegel Apr 19 '13 at 14:46
  • db.cluster.stats() { "ns" : "balloon.cluster", "count" : 7197441, "size" : 3093158016, "avgObjSize" : 429.75802316406623, "storageSize" : 3922726912, "numExtents" : 23, "nindexes" : 4, "lastExtentSize" : 1021419520, "paddingFactor" : 1.9890000010386042, "systemFlags" : 1, "userFlags" : 0, "totalIndexSize" : 3193087744, "indexSizes" : { "_id_" : 210491120, "sameAs_idx" : 1372619584, "links_1" : 1433759712, "lock" : 176217328 }, "ok" : 1 } – Kai Schlegel Apr 19 '13 at 14:48
  • i wonder why a simple remove and update is also in the logs: Fri Apr 19 16:50:53.619 [conn38] remove balloon.sameAs query: { _id: ObjectId('51630ecfe4b0794ae578c6e2') } ndeleted:1 keyUpdates:0 locks(micros) w:517 719ms Fri Apr 19 16:50:53.620 [conn50] update balloon.cluster query: { _id: ObjectId('51692a71010dc5bb9717281c') } update: { $pull: { lock: "476b4af1-97f8-42ba-8793-0fd5eb8248b0" } } nscanned:1 nupdated:1 keyUpdates:1 locks(micros) w:719 718ms i'm really thankful for your help! – Kai Schlegel Apr 19 '13 at 14:52
  • The log example you provided shows a run time of 256ms so way under 3 seconds. Looks like it is waiting on a write lock for most of that time. Do you see any entries that take the 3 seconds you are seeing from the application? – James Wahlin Apr 19 '13 at 15:07
  • @JamesWahlin: i think the write lock comes from another collection. i'm investigating this in detail. thank you! i get many log entries like this one: Fri Apr 19 17:12:22.002 [conn46] balloon.sameAs warning: cursor loc 2:51b7850 does not match byLoc position 1:3171d10 ! can you explain what it means? – Kai Schlegel Apr 19 '13 at 15:16
  • 1
    @mnemosyn: thank you for your hint. i uses the collMod function (http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes) to overcome this problem. maybe this will help to speedup the database – Kai Schlegel Apr 19 '13 at 15:21
  • @KaiSchlegel - as per the code that generates the cursor loc warning, it is likely harmless (see https://github.com/mongodb/mongo/blob/master/src/mongo/db/clientcursor.cpp#L292). You could run validate (with full set to true) on your your collection to confirm that there are no issues with your data or indexes. Note that this is resource intensive, and if on a primary you may want to step down first. Details are here: http://docs.mongodb.org/manual/reference/command/validate/#dbcmd.validate – James Wahlin Apr 19 '13 at 15:32
  • If you've resolved the problem, it would be really helpful if you could post back about whatever it was. I was thinking that, if the document were really complex, a good portion of that 3 seconds was subsumed in serialization / deserialization; but whatever it was, it would help future seekers whose issues might have overlapped with yours... – Curt Jun 28 '13 at 19:44
  • i couldn't solve the problem entirely, but i improved the result by correcting a wrong index. we are thinking about using sharding to overcome this problem. – Kai Schlegel Jun 30 '13 at 12:05

0 Answers0