0

I am trying to use an index to access documents in a collection using a particular index in Python 3.5 with Pymongo 3.3.0 and Mongodb 3,2. I created a field called sequence in each document that contains a number defining the load order of the documents. I created an index on this field and gave the index the name 'sequence' using the following command:

db.pages.create_index([('pages.sequence', pymongo.ASCENDING)], name='sequence')

The code used to load the documents is:

with MongoClient(tz_aware=True) as client:
    db = Database(client, name)
    for doc in db['pages'].find().hint('sequence').limit(1):
        ...

The document returned in doc does not contain the sequence number 1 and is not the first document loaded into Mongodb. How should I ensure that documents are returned in ascending order based on the value in the 'sequence' field in each document?

Edit: Sort cannot be used as the collection is too big, roughly 16GB in size. The documentation seems to state that using an index will return data in the order of the keys in the index. If this is not so, is it not possible to use an index to define the order of retrieval of documents in a collection?

Jonathan
  • 2,635
  • 3
  • 30
  • 49
  • If you want results returned in a specific order, you should specific `sort()` criteria and direction rather than hinting, eg: `find.sort('sequence',pymongo.ASCENDING).limit(1)`. In general hinting should be unnecessary unless you have multiple candidate indexes for a given query shape and an incorrect index is being consistently chosen by the query optimizer. – Stennie Aug 15 '16 at 01:08
  • I tried sort but the data is too large > 32MB. This is a large database. – Jonathan Aug 15 '16 at 02:38
  • You need an index to avoid the 32MB in-memory sort limitation; see [Use Indexes to Sort Query Results](https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/). Make sure your index definition matches the sort spec. It looks like your index is on "pages.sequence" (which would be a `sequence` field inside a `pages` subdocument) but your description suggests that `sequence` may be a top-level field. Depending on your schema either the index should be on `sequence`, or your sort spec should be on `pages.sequence`. You can use `explain()` to debug the query planning and index usage. – Stennie Aug 15 '16 at 05:55
  • 'pages' is the name of the collection. Within the collection are individual top-level documents with many sub-documents. The 'sequence' field is in the top level document and so far as i – Jonathan Aug 15 '16 at 10:31
  • 'pages' is the name of the collection. Within the collection are individual top-level documents with many sub-documents. The 'sequence' field is in the top level document and so far as I know the index is on that field. It says so when I display the index structure. Thus to me it seems that I am doing what you suggest. How then should I be specifying the index creation? – Jonathan Aug 15 '16 at 10:48
  • I am unable to understand how to use 'explain()'. As I understand it 'find()' returns a cursor as do functions like 'hint()' and 'limit()'. However 'explain()' returns an 'explain' document and thus does to fit into the function chain. When I try to use 'explain()' independently, it says that I have a dictionary, not a cursor and that 'explain()' is not part of my document which is reasonable. So how would I use 'explain()'? – Jonathan Aug 15 '16 at 10:54

0 Answers0