I'm using mongodb's capped collections + tailable cursors as a pubsub engine, using pymongo. (This approach is described here). My environment includes several topics, several publishers, and a few dozen subscribers.
This works great, except that every once in a while I encounter insert-operations which take surprisingly long time to complete.
My understanding is that mongodb can support insertion of >10K docs/sec, which is why I am surprised that a single insert operation can take 85ms. (I also observed >1sec a few times in my production environment, but couldn't reproduce it using this SSCCE).
To keep delays to a minimum, journaling is disabled, and I am using the lowest write-concern, which helped, but didn't solve the problem.
What causes this and how can I fix it?
Here is a SSCCE. I run the following script twice in parallel:
import time, pymongo, pandas as pd
DBNAME = 'test5'
COLLNAME = 'abc'
mongo = pymongo.MongoClient()
#mongo[DBNAME].drop_collection(COLLNAME)
if DBNAME not in mongo.database_names():
coll = mongo[DBNAME].create_collection(COLLNAME,
capped = True, size = 50 * 1024 * 1024, max = 1024 * 1024,
fsync = False, # no delays please (FYI, journaling is disabled)
#autoIndexId = False,
)
tarr = []
for _ in range(1000):
msg = dict.fromkeys([ 'f%d' % j for j in range(30) ], 'foo')
t1 = time.time()
coll.insert(msg) # <-- INSERT() HERE
tarr.append(time.time() - t1)
time.sleep(1./100) # 100 docs/sec. surely not too much load...
print pd.Series(tarr).describe()
Ouput:
RUN1:
count 1000.000000
mean 0.001555
std 0.002712
min 0.000654
25% 0.001021
50% 0.001084
75% 0.001151
max 0.065006 <-- 65ms!!
dtype: float64
RUN2:
count 1000.000000
mean 0.001733
std 0.004013
min 0.000717
25% 0.001043
50% 0.001069
75% 0.001120
max 0.084803 <-- 84ms!!
dtype: float64
Index information:
mongo[DBNAME][COLNAME].index_information()
> {u'_id_': {u'key': [(u'_id', 1)], u'ns': u'xxxxx', u'v': 1}}
mongodb info:
db version v3.2.0-rc0
git version: bf28bd20fa507c4d8cc5919dfbbe87b7750ae8b0
OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
allocator: tcmalloc
modules: none
build environment:
distmod: ubuntu1404
distarch: x86_64
target_arch: x86_64
"storageEngine" : { "name" : "mmapv1", "supportsCommittedReads" : false }
journaling DISABLED
pymongo version = 3.0.3
Output of iostat -xtm 5
during the experiment:
*** BEFORE ***
11/08/2015 09:05:02 AM
avg-cpu: %user %nice %system %iowait %steal %idle
6.11 0.95 0.49 0.10 0.00 92.35
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 10.20 0.00 6.80 0.00 0.08 24.71 0.14 20.12 0.00 20.12 2.82 1.92
dm-0 0.00 0.00 0.00 17.00 0.00 0.08 9.88 0.24 13.84 0.00 13.84 1.13 1.92
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 7.80 0.00 0.03 8.41 0.04 4.92 0.00 4.92 2.05 1.60
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
*** DURING ***
11/08/2015 09:05:07 AM
avg-cpu: %user %nice %system %iowait %steal %idle
9.25 0.26 0.55 0.26 0.00 89.67
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 13.60 0.00 40.60 0.00 0.59 29.52 4.84 119.15 0.00 119.15 1.60 6.48
dm-0 0.00 0.00 0.00 54.20 0.00 0.59 22.11 5.00 92.18 0.00 92.18 1.20 6.48
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 4.20 0.00 0.02 9.14 0.01 2.29 0.00 2.29 2.29 0.96
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
11/08/2015 09:05:12 AM
avg-cpu: %user %nice %system %iowait %steal %idle
8.02 0.26 0.41 0.44 0.00 90.88
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 19.00 0.00 109.80 0.00 0.63 11.82 22.95 209.01 0.00 209.01 1.84 20.24
dm-0 0.00 0.00 0.00 128.80 0.00 0.63 10.07 25.55 198.37 0.00 198.37 1.57 20.24
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 4.40 0.00 0.02 8.00 0.02 4.73 0.00 4.73 2.18 0.96
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
11/08/2015 09:05:17 AM
avg-cpu: %user %nice %system %iowait %steal %idle
5.81 0.27 0.39 0.16 0.00 93.37
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 8.40 0.00 24.80 0.00 0.29 24.19 0.26 10.55 0.00 10.55 1.94 4.80
dm-0 0.00 0.00 0.00 33.20 0.00 0.29 18.07 0.31 9.35 0.00 9.35 1.45 4.80
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 3.40 0.00 0.01 8.94 0.01 2.35 0.00 2.35 2.35 0.80
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
*** AFTER ***
11/08/2015 09:05:22 AM
avg-cpu: %user %nice %system %iowait %steal %idle
5.85 0.40 0.46 0.00 0.00 93.28
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 3.80 0.00 0.40 0.00 0.02 84.00 0.00 6.00 0.00 6.00 6.00 0.24
dm-0 0.00 0.00 0.00 4.20 0.00 0.02 8.00 0.03 7.81 0.00 7.81 0.57 0.24
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 1.60 0.00 0.01 11.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
UPDATE: we switched mongodb to run in-memory (along the lines of what's described here). The machine has enough memory to sustain it. Surprisingly, it didn't help! We're still seeing the same "spikes" in insertion times. Something weird is happening. I have no idea what's causing it.
UPDATE: we also upgraded to mongodb V3.2, in-memory. Same results.