12

I'm in progress with estimation of MongoDB for our customers. Per requirements we need associate with some entity ent variable set of name-value pairs.

db.ent.insert({'a':5775, 'b':'b1'})
db.ent.insert({'c':'its a c', 'b':'b2'})
db.ent.insert({'a':7557, 'c':'its a c'})

After this I need intensively query ent for presence of fields:

db.ent.find({'a':{$exists:true}})
db.ent.find({'c':{$exists:false}})

Per MongoDB docs:

$exists is not very efficient even with an index, and esp. with {$exists:true} since it will effectively have to scan all indexed values.

Can experts there provide more efficient way (even with shift the paradigm) to deal fast with vary name-value pairs

Dewfy
  • 23,277
  • 13
  • 73
  • 121
  • look at: http://www.mongodb.org/display/DOCS/Using+Multikeys+to+Simulate+a+Large+Number+of+Indexes – Dewfy Jan 25 '12 at 23:57
  • I'm by no means an expert but if you know the type, wouldn't it be way faster to do `{ a: { $type: "number" } }` instead? – ProblemsOfSumit Nov 15 '20 at 19:50
  • @ProblemsOfSumit It was old question, but anyway: field may be any type, not only number, so checking a type not a case at all. – Dewfy Nov 16 '20 at 08:17

4 Answers4

9

You can redesign your schema like this:

{
  pairs:[
  {k: "a", v: 5775},
  {k: "b", v: "b1"},
  ]
}

Then you indexing your key:

db.people.ensureIndex({"pairs.k" : 1})

After this you will able to search by exact match:

db.ent.find({'pairs.k':"a"})

In case you go with Sparse index and your current schema, proposed by @WesFreeman, you will need to create an index on each key you want to search. It can affect write performance or will be not acceptable if your keys are not static.

Eve Freeman
  • 32,467
  • 4
  • 86
  • 101
Andrew Orsich
  • 52,935
  • 16
  • 139
  • 134
  • Very interesting. But how can I figure out what document was associated with key 'a' (owns by 'a'). Is there something like `$parent({pairs.k:a})` ? – Dewfy Jan 25 '12 at 22:21
  • @Dewfy mongodb always return root level document (even if you search by embedded array), so you no need search for parent, it will be return by the query. Just try it out and you will see. – Andrew Orsich Jan 25 '12 at 22:43
  • +1 Nice redesign. The sparse index may be faster, if the keys are indeed sparse, but like you say there are drawbacks. – Eve Freeman Jan 26 '12 at 01:11
  • 2
    Soooo the solution is to redesign your data layout to look like a relational database? lol? Why does Mongo not support indexing the key-names so we can make `$exists` queries indexable? – BlueRaja - Danny Pflughoeft Oct 11 '14 at 02:37
2

Simply redesign your schema such that it's an indexable query. Your use case is infact analogous to the first example application given in MongoDB The Definitive Guide.

If you want/need the convenience of result.a just store the keys somewhere indexable.

instead of the existing:

db.ent.insert({a:5775, b:'b1'})

do

db.ent.insert({a:5775, b:'b1', index: ['a', 'b']})

That's then an indexable query:

db.end.find({index: "a"}).explain()
{
    "cursor" : "BtreeCursor index_1",
    "nscanned" : 1,
    "nscannedObjects" : 1,
    "n" : 1,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
        "index" : [
            [
                "a",
                "a"
            ]
        ]
    }
}

or if you're ever likely to query also by value:

db.ent.insert({
    a:5775, 
    b:'b1', 
    index: [
        {name: 'a', value: 5775}, 
        {name: 'b', value: 'b1'}
    ]
})

That's also an indexable query:

db.end.find({"index.name": "a"}).explain()
{
    "cursor" : "BtreeCursor index.name_",
    "nscanned" : 1,
    "nscannedObjects" : 1,
    "n" : 1,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
        "index.name" : [
            [
                "a",
                "a"
            ]
        ]
    }
}
AD7six
  • 63,116
  • 12
  • 91
  • 123
1

I think a sparse index is the answer to this, although you'll need an index for each field. http://www.mongodb.org/display/DOCS/Indexes#Indexes-SparseIndexes

Sparse indexes should help with $exists:true queries.

Even still, if your field is not really sparse (meaning it's mostly set), it's not going to help you that much.

Update I guess I'm wrong. Looks like there's an open issue ( https://jira.mongodb.org/browse/SERVER-4187 ) still that $exists doesn't use sparse indexes. However, you can do something like this with find and sort, which looks like it properly uses the sparse index:

db.ent.find({}).sort({a:1});

Here's a full demonstration of the difference, using your example values:

> db.ent.insert({'a':5775, 'b':'b1'})
> db.ent.insert({'c':'its a c', 'b':'b2'})
> db.ent.insert({'a':7557, 'c':'its a c'})
> db.ent.ensureIndex({a:1},{sparse:true});

Note that find({}).sort({a:1}) uses the index (BtreeCursor):

> db.ent.find({}).sort({a:1}).explain();
{
"cursor" : "BtreeCursor a_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
    "a" : [
        [
            {
                "$minElement" : 1
            },
            {
                "$maxElement" : 1
            }
        ]
    ]
}
}

And find({a:{$exists:true}}) does a full scan:

> db.ent.find({a:{$exists:true}}).explain();
{
"cursor" : "BasicCursor",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}

Looks like you can also use .hint({a:1}) to force it to use the index.

> db.ent.find().hint({a:1}).explain();
{
"cursor" : "BtreeCursor a_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
    "a" : [
        [
            {
                "$minElement" : 1
            },
            {
                "$maxElement" : 1
            }
        ]
    ]
}
}
Eve Freeman
  • 32,467
  • 4
  • 86
  • 101
0

How about setting the non-exists field to null? Then you can query them with {field: {$ne: null}}.

db.ent.insert({'a':5775, 'b':'b1', 'c': null})
db.ent.insert({'a': null, 'b':'b2', 'c':'its a c'})
db.ent.insert({'a':7557, 'b': null, 'c':'its a c'})

db.ent.ensureIndex({"a" : 1})
db.ent.ensureIndex({"b" : 1})
db.ent.ensureIndex({"c" : 1})

db.ent.find({'a':{$ne: null}}).explain()

Here's the output:

{
    "cursor" : "BtreeCursor a_1 multi",
    "isMultiKey" : false,
    "n" : 4,
    "nscannedObjects" : 4,
    "nscanned" : 5,
    "nscannedObjectsAllPlans" : 4,
    "nscannedAllPlans" : 5,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "a" : [
            [
                {
                    "$minElement" : 1
                },
                null
            ],
            [
                null,
                {
                    "$maxElement" : 1
                }
            ]
        ]
    },
    "server" : "my-laptop"
}
Trantor Liu
  • 8,770
  • 8
  • 44
  • 64
  • 1
    "$ne" queries cannot use indexes. http://docs.mongodb.org/manual/faq/indexes/#using-ne-and-nin-in-a-query-is-slow-why – Megawolt Sep 05 '13 at 11:19