10

Please look at the following lines of code and the results:

import pymongo

d1 = {'p': 0.5, 'theta': 100, 'sigma': 20}
d2 = {'theta': 100, 'sigma': 20, 'p': 0.5}

I get the following results:

d1 == d2 // Returns True

collectn.find({'goods.H': d1}).count() // Returns 33

collectn.find({'goods.H': d2}).count() // Returns 2

where, collectn is a Mongodb collections object.

Is there a setting or a way to query so that I obtain the same results for the above two queries?

They are essentially using the same dictionary (in the sense of d1 == d2 being True). I am trying to do the following: before inserting a record into the database I check whether there already exists a record with the exact value combination that is being added. If so, then I don't want to make a new record. But because of the above shown behavior it becomes possible to get that the record does not exist even when it does and a duplicate record is added to the database (of course, with different _id but all other values are the same, and I would prefer not to have that).

Thank you in advance for your help.

Curious2learn
  • 31,692
  • 43
  • 108
  • 125
  • What indexes do you have on the collection? – mayhewr Jan 14 '13 at 19:00
  • I have not made explicitly made any indexes. I am using Mongolab for hosting and there in indexed fields I see: `{ "_id" : 1}` – Curious2learn Jan 14 '13 at 19:19
  • Well I'm sure it has something to do with the fact that mongodb stores everything as BSON objects which are inherently ordered and python dictionaries are unordered. It seems like maybe mongodb is trying to optimize the lookup on the subdocument and its internal equality operator is not satisfied without an order match. Sorry I can't be more helpful at the moment. – mayhewr Jan 14 '13 at 19:36
  • 3
    Can you change the question to insert data into the collection so that solutions can be tested before being posted. I suspect the answer lies with dictionaries being unordered. For the two finds referenced in your question I get 0 as the answer. – Keith John Hutchison Jan 14 '13 at 19:51

4 Answers4

6

The issue you are having is explained in the mongodb documentation here. It also has to do with the fact that Python dictionaries are unordered and MongoDB objects are ordered BSON objects.

The relevant quote being,

Equality matches within subdocuments select documents if the subdocument matches exactly the specified subdocument, including the field order.

I think you might be better off if you treat all three properties as subproperties of the main object instead of one collection of properties that is the subobject. That way the ordering of the subobject is not forced into the query by the python interpreter.

For instance...

d1 = {'goods.H.p': 0.5, 'goods.H.theta': 100, 'goods.H.sigma': 20}
d2 = {'goods.H.theta': 100, 'goods.H.sigma': 20, 'goods.H.p': 0.5}

collectn.find(d1).count()
collectn.find(d2).count()

...may yield more consistent results.

Finally, a way to do it changing less code:

collectn.find({'goods.H.' + k:v for k,v in d1.items()})
collectn.find({'goods.H.' + k:v for k,v in d2.items()})
mayhewr
  • 4,003
  • 1
  • 18
  • 15
  • You can automate key conversion like in `find({'goods.H'+'.'+k:v for k, v in d1.items()}` – georg Jan 14 '13 at 20:07
  • @Curious2learn No problem. Last thing I would add, since you mentioned not having any secondary indexes, is that it would be a good idea to at least read the [Indexing Strategies](http://docs.mongodb.org/manual/applications/indexes/) page on the MongoDB documentation site. – mayhewr Jan 14 '13 at 21:04
1

I can only think of two things to do:

  1. Structure your query as this: collectn.find({'goods.H.p':0.5, 'goods.H.theta':100, 'goods.H.sigma':20}).count(). That will find the correct number of documents...

  2. Restructure your data -> if you look at MongoDB : Indexes order and query order must match? you will that you can index on p,sigma,theta so that when, in the query, any order of the terms will provide the correct result. In my brief tests (I am no expert) I was not able to index in a way that produces that same effect with your current structure.

Community
  • 1
  • 1
IamAlexAlright
  • 1,500
  • 1
  • 16
  • 29
0

I think your problem is mentioned in mongodb doc:

The field must match the sub-document exactly, including order....

look at documentation here. There is example with sub-document.

Fields in sub-document have to be in the same order as in query to be matched.

Michal
  • 2,074
  • 2
  • 22
  • 29
0

I think you're looking for the $where operator.

This works in Node:

var myCursor = coll.find({$where: function () {return obj.goods.H == d1}});
myCursor.count(function (err, myCount) {console.log(myCount)});

In Python I believe you'll need to pass in a BSON code object.

The documentation warns that the $where operator should be used as a last resort since it comes with a performance penalty, and can't use indexes.

It seems like it may be worthwhile to establish an ordering of the sub properties, and enforce it if possible on insert or as a post process.

mjhm
  • 16,497
  • 10
  • 44
  • 55