1

Updated for clarity: I need advice for performance when inserting/appending to a capped collection. I have two python scripts running:

(1) Tailing the cursor.

while WSHandler.cursor.alive:
        try:
            doc = WSHandler.cursor.next()
            self.render(doc)

(2) Inserting like so:

def on_data(self, data):                      #Tweepy
    if (len(data) > 5):
        data = json.loads(data)
        coll.insert(data)                     #insert into mongodb
        #print(coll.count())
        #print(data)

and it's running fine for a while (at 50 inserts/second). Then, after 20-60secs, it stumbles, hits the cpu roof (though it was running at 20% before), and never recovers. My mongostats take a dive (the dive is shown below).

Mongostat output: Mongostat output

The CPU is now choked, by the processes doing the insertion (at least according to htop).

When I run the Tweepy lines above with print(data) instead of adding it to db (coll.insert(data)), everything's running along fine at 15% cpu use.

What I see in mongostats:

  • res keeps climbing. (Though clogs may happen at 40m as well as run fine on 100m.)
  • flushes do not seem to interfere.
  • locked % is stable at 0.1%. Would this lead to clogging eventually?

(I'm running AWS microinstance; pymongo.)

Community
  • 1
  • 1
knutole
  • 1,709
  • 2
  • 22
  • 41
  • 1
    What sort of performance do you get elsewhere? I've never heard anything positive about performance on AWS micros. – Sean McSomething Oct 02 '12 at 21:57
  • Haven't checked elsewhere, and not currently in position to do so. It's running along fine most of the time, actually, but clogging up every now and then - and when it does, it's plumbing time and nothing else to do. :/ – knutole Oct 02 '12 at 22:01
  • Perhaps your collection is missing an index. What's the output of db.coll.stats()? – William Z Oct 02 '12 at 23:05
  • It's not indexed as it's a capped collection (used for tailable cursors). – knutole Oct 02 '12 at 23:16
  • Are you performing other operations on database beside those inserts? Maybe adding lots of data with update statement or something? – grizwako Oct 02 '12 at 23:30
  • Nothing but tailing the cursor (`while WSHandler.cursor.alive: doc = WSHandler.cursor.next()`), no other inserts nor updates. (Although I'm tailing with an output of 15-30 items/second.) – knutole Oct 02 '12 at 23:40
  • 1
    Even though we already have our answer I would like to expand on the mongostat output. 1) res is resident memory, this should climb as you insert and query. I have instances singing along with gigabytes in resident 2) locked % is per second, so 50% would mean the db/collection was write locked for 500ms. db or collection locking depends on the version of mongo. – Scott Oct 03 '12 at 17:20
  • Thanks Scott! Learning a lot from this! :) – knutole Oct 03 '12 at 17:31

1 Answers1

1

I would suggest using mongostat while running your tests. There are many things that could be wrong but mongostat will give you a good indication.

http://docs.mongodb.org/manual/reference/mongostat/

The first two things I would look at are the lock percentage and the data throughput. With reasonable throughput on dedicated machines I typically get into the 1000-2000 updates/inserts per second before suffering any degradation. This has been the case for several large production deployments I have worked with.

Scott
  • 1,012
  • 7
  • 14
  • Thanks, I've added the mongostats. To my surprise it seems to run rather smoothly, but maybe I'm not reading it right? – knutole Oct 03 '12 at 08:13
  • Interestingly, the inserts peak at 180 inserts/sec, then fall back to 15-20 inserts per second - though with almost no locking during peak. Peak throughput is around 500k. I'm wondering if it's actually Tweepy that's clogging the CPU at that throughput? – knutole Oct 03 '12 at 08:17
  • Also, the `res` in mongostats keep increasing. Could it be that I'm hitting a 'roof' and thus clogging starts? – knutole Oct 03 '12 at 08:54
  • I swapped out Tweepy for another client - seems to have fixed it. Thanks. – knutole Oct 03 '12 at 11:29
  • 1
    Good to hear! I don't know too much about about python or that particular client, not to mention I was sleeping =) So I'm glad you were able to figure that out. – Scott Oct 03 '12 at 14:10
  • This ended up being due to the AWS micro-instance [capping policy](http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/concepts_micro_instances.html). – knutole Oct 08 '12 at 00:57