7

I'm using mongodb's capped collections + tailable cursors as a pubsub engine, using pymongo. (This approach is described here). My environment includes several topics, several publishers, and a few dozen subscribers.

This works great, except that every once in a while I encounter insert-operations which take surprisingly long time to complete.

My understanding is that mongodb can support insertion of >10K docs/sec, which is why I am surprised that a single insert operation can take 85ms. (I also observed >1sec a few times in my production environment, but couldn't reproduce it using this SSCCE).

To keep delays to a minimum, journaling is disabled, and I am using the lowest write-concern, which helped, but didn't solve the problem.

What causes this and how can I fix it?


Here is a SSCCE. I run the following script twice in parallel:

import time, pymongo, pandas as pd
DBNAME = 'test5'
COLLNAME = 'abc'
mongo = pymongo.MongoClient()
#mongo[DBNAME].drop_collection(COLLNAME)
if DBNAME not in mongo.database_names():
    coll = mongo[DBNAME].create_collection(COLLNAME,
                                    capped = True, size = 50 * 1024 * 1024, max = 1024 * 1024,
                                    fsync = False,  # no delays please (FYI, journaling is disabled)
                                    #autoIndexId = False,
                                    )

tarr = []
for _ in range(1000):
    msg = dict.fromkeys([ 'f%d' % j for j in range(30) ], 'foo')
    t1 = time.time()
    coll.insert(msg)  # <-- INSERT() HERE
    tarr.append(time.time() - t1)
    time.sleep(1./100)  # 100 docs/sec. surely not too much load...

print pd.Series(tarr).describe()

Ouput:

RUN1:
count    1000.000000
mean        0.001555
std         0.002712
min         0.000654
25%         0.001021
50%         0.001084
75%         0.001151
max         0.065006  <-- 65ms!! 
dtype: float64

RUN2:
count    1000.000000
mean        0.001733
std         0.004013
min         0.000717
25%         0.001043
50%         0.001069
75%         0.001120
max         0.084803  <-- 84ms!! 
dtype: float64

Index information:

mongo[DBNAME][COLNAME].index_information()
> {u'_id_': {u'key': [(u'_id', 1)], u'ns': u'xxxxx', u'v': 1}}

mongodb info:

db version v3.2.0-rc0
git version: bf28bd20fa507c4d8cc5919dfbbe87b7750ae8b0
OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
allocator: tcmalloc
modules: none
build environment:
    distmod: ubuntu1404
    distarch: x86_64
    target_arch: x86_64

"storageEngine" : { "name" : "mmapv1", "supportsCommittedReads" : false }

journaling DISABLED
pymongo version = 3.0.3

Output of iostat -xtm 5 during the experiment:

*** BEFORE ***
11/08/2015 09:05:02 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.11    0.95    0.49    0.10    0.00   92.35

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    10.20    0.00    6.80     0.00     0.08    24.71     0.14   20.12    0.00   20.12   2.82   1.92
dm-0              0.00     0.00    0.00   17.00     0.00     0.08     9.88     0.24   13.84    0.00   13.84   1.13   1.92
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    7.80     0.00     0.03     8.41     0.04    4.92    0.00    4.92   2.05   1.60
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00


*** DURING ***
11/08/2015 09:05:07 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.25    0.26    0.55    0.26    0.00   89.67

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    13.60    0.00   40.60     0.00     0.59    29.52     4.84  119.15    0.00  119.15   1.60   6.48
dm-0              0.00     0.00    0.00   54.20     0.00     0.59    22.11     5.00   92.18    0.00   92.18   1.20   6.48
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    4.20     0.00     0.02     9.14     0.01    2.29    0.00    2.29   2.29   0.96
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

11/08/2015 09:05:12 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.02    0.26    0.41    0.44    0.00   90.88

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    19.00    0.00  109.80     0.00     0.63    11.82    22.95  209.01    0.00  209.01   1.84  20.24
dm-0              0.00     0.00    0.00  128.80     0.00     0.63    10.07    25.55  198.37    0.00  198.37   1.57  20.24
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    4.40     0.00     0.02     8.00     0.02    4.73    0.00    4.73   2.18   0.96
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

11/08/2015 09:05:17 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.81    0.27    0.39    0.16    0.00   93.37

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     8.40    0.00   24.80     0.00     0.29    24.19     0.26   10.55    0.00   10.55   1.94   4.80
dm-0              0.00     0.00    0.00   33.20     0.00     0.29    18.07     0.31    9.35    0.00    9.35   1.45   4.80
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    3.40     0.00     0.01     8.94     0.01    2.35    0.00    2.35   2.35   0.80
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00


*** AFTER ***
11/08/2015 09:05:22 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.85    0.40    0.46    0.00    0.00   93.28

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     3.80    0.00    0.40     0.00     0.02    84.00     0.00    6.00    0.00    6.00   6.00   0.24
dm-0              0.00     0.00    0.00    4.20     0.00     0.02     8.00     0.03    7.81    0.00    7.81   0.57   0.24
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    1.60     0.00     0.01    11.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

UPDATE: we switched mongodb to run in-memory (along the lines of what's described here). The machine has enough memory to sustain it. Surprisingly, it didn't help! We're still seeing the same "spikes" in insertion times. Something weird is happening. I have no idea what's causing it.

UPDATE: we also upgraded to mongodb V3.2, in-memory. Same results.

shx2
  • 61,779
  • 13
  • 130
  • 153
  • run the query using explain() so db.COLLNAME.explain().insert() this is from mongo shell i am not a Python programmer but you get more info. using explain(true) will offer additional info. – Hughzi Nov 04 '15 at 12:27
  • @Hughzi I couldn't find how to `explain` an insert operation... There's already an unanswered question about that (actually, about update): http://stackoverflow.com/q/18001717/2096752 – shx2 Nov 04 '15 at 12:46
  • Apologies i didn't realize there was no explain on insert operations. One thing could be indexes, can you run db.COLLNAME.getIndexes()? they slow down writes more than any other variable. Which storage engine are you using? – Hughzi Nov 04 '15 at 13:19
  • 1
    It is worth noting here that you are running a MongoDB 3.2 release candidate (noted by the "-rc0" in the db version) rather than a stable release. It may be worth trying 3.0.7 (the latest stable release). If release candidate testing is what you want, I would at a minimum upgrade to 3.2.0-rc1 or 3.2.0-rc2 which should be released in the near future. – James Wahlin Nov 04 '15 at 13:39
  • @JamesWahlin, right. I forgot to mention I also tried the same thing using 3.0.7. Same behavior. I switched to v3.2.0-rc0 to check if it helps. – shx2 Nov 04 '15 at 14:25
  • @Hughzi I added index information. re storage engine: I don't know. how do I check? – shx2 Nov 04 '15 at 14:32
  • db.serverStatus() will tell you. I'm guessing you are using Mmap v1 as this is default. it could be as you seem to be inserting loads at once you are committing whats in memory to disc and reallocating virtual memory to populate collections. this constant reallocation will mean some slower writes, do smaller tests and see if this reduces the maximum time. – Hughzi Nov 04 '15 at 15:17
  • @Hughzi mmapv1 indeed. I added the info to the post. – shx2 Nov 04 '15 at 15:38
  • did you try adding less at once? – Hughzi Nov 04 '15 at 15:43
  • @Hughzi, the slow messages are infrequent, so adding less would probably just cause them to appear less frequently. Anyhow, mongodb is supposed to be able to process many thousands of messages per second, and I am only inserting 200. It's 1 or 2 orders of magnitude less. – shx2 Nov 04 '15 at 16:15
  • @shx2 remove the max 1024 * 1024, tell me the result please. – Hughzi Nov 04 '15 at 16:31
  • @shx2 sorry, remove the max 1024^2 as that's optional you are asking for ~1000000 document. you don't need to specify this it may be causing strange allocation issues. – Hughzi Nov 04 '15 at 16:59
  • @Hughzi - if this replicaset / cluster was created using 3.2.0-rc0 then the default would be WiredTiger (which is the new default under 3.2) – James Wahlin Nov 04 '15 at 19:10
  • One more thing I would add is that disk is often the bottleneck when it comes to write performance. It may be worth running iostat -xtm 1 during this test to evaluate (for the duration of the test). – James Wahlin Nov 04 '15 at 19:16
  • @Hughzi, I tried dropping the `max` argument, and recreate the collection. no change in behavior. – shx2 Nov 08 '15 at 08:11
  • @JamesWahlin I added the output of `iostat` to the post. I'm not used to looking at that, so I'm not sure if that's revealing anything. – shx2 Nov 08 '15 at 08:12
  • The await / w_await values during your test are suspicious. The second reading shows a w_await value of 200ms. This means it is taking 200ms for your disk to satisfy a write operation and is extremely slow. If you can run mongod on a faster disk it is likely you will see better performance. – James Wahlin Nov 09 '15 at 13:53
  • @JamesWahlin, please see the update in my question. We're now running mongodb in-memory, and surprisingly experiencing the same behavior... – shx2 Mar 22 '16 at 12:24
  • @Hughzi, please see the update in my question. We're now running mongodb in-memory, and surprisingly experiencing the same behavior... – shx2 Mar 22 '16 at 12:24
  • @shx2 Hi again, four months on and you still having same issue. we recently upgraded our RAM and that improved our DB performance I don't know anything about your hardware? but i doubt this will be your issue. you have the one default index which should be fine. – Hughzi Mar 22 '16 at 16:30
  • @shx2 run a "top" as you run your test this will show % of memory used. (and CPU but you have given us that already, and as expected its low). it could be overloading memory its worth looking at while i think on it? Though i am doubtful its a relatively quick test – Hughzi Mar 22 '16 at 16:33

1 Answers1

2

Looks like the problem is that you're only using fully non-blocking writes when creating the collection (calling create_collection), not when inserting.

You can disable write acknowledgement at the MongoClient level as:

mongo = pymongo.MongoClient(w=0)

Or for just the insert calls as:

coll.insert(msg, w=0)

Either way, the max time drops to 0.001 when I try this locally on a normal disk-backed 3.2 MongoDB instance using WiredTiger storage.

JohnnyHK
  • 305,182
  • 66
  • 621
  • 471
  • This certainly reduces the time the writer is blocked for, but unfortunately I still have the same underlying problem, which is the insertion takes too much time on mongodb's side. – shx2 Mar 27 '16 at 10:11
  • @shx2 Ok, so how are you measuring that? – JohnnyHK Mar 27 '16 at 15:07