I asked this same question on the mongodb-user list: http://groups.google.com/group/mongodb-user/browse_thread/thread/b3470d6a867cd24
I was hoping someone on this forum might have some insight...
I've run a simple experiment comparing the performance of cursor iteration using python vs. java and have found that the python implementation is about 10x slower. I was hoping someone could tell me if this difference is expected or if I'm doing something clearly inefficient on the python side.
The benchmark is simple: it performs a query, iterates over the cursor, and inspects the same field in each document. In the python version, I can inspect about 22k documents per second. In the java version, I can inspect about 220k documents per second.
I've seen a few similar questions about python performance and I've taken the advice and made sure I'm using the C extensions:
>>> import pymongo
>>> pymongo.has_c()
True
>>> import bson
>>> bson.has_c()
True
Finally, I don't believe the discrepancy is due to fundamental differences between python and java, at least at the level my test code. For example, if I store the queried documents in a python list, I can iterate over that list very quickly. In other words, it's not an inefficient python for-loop that accounts for the difference. Furthermore, I get almost identical performance Java vs. Python when inserting documents.
Here are a few more details about the query:
- Both the python and java implementations use the same query on the same collection and run on the same machine.
- The collection contains about 20 million documents.
- The query returns about 2 million documents, i.e., I'm retrieving about 10% of the collection.
- Each document contains three simple fields: a date and two strings.
- The query is indexed and the time spent in the actual query is negligible for both the python and java implementations.It's the cursor iteration that accounts for the runtime.