Querying MongoDB for Elements Contained in an Array in Order of Number of Matching Elements

Question

I have a MongoDB with a collection of documents looking as such:

{


_id: ObjectId("5222769532fed3037d000049"),


cat_list: { "52226a8932fed36c3000007e": 1   },   
feature_list: {
    "52227433abb03fa34b0000fa": 0.2,
    "52236117099c8924c500004a": 0.2,
    "52236236c71890c199000054": 0.2,
    "522374d2842e497d2c00000c": 0.2,
     "52237e3b842e493760000021": 0.2   },   
title: "blah"

}

Now I basically wan to find documents that match a set of keys provided to the keys inside feature_list. For this, I suppose I will use an $in [,] of some form, though I'm not sure how to do that with a key:value list. (Is it the same?)

Secondly, I want to order the results by descending frequency (if I can do this in mongodb without having to do it in my server-side code). So, if I was querying for "52227433abb03fa34b0000fa" OR "52236117099c8924c500004a" OR "52236236c71890c199000054", my result set would contain all the documents having a feature_list containing one or more of those in the OR list and the document with the most amount of matches would be first, followed by the next in frequency and so forth.

Just to clarify: you want to find by field names, not by values? Do you have to return complete matched documents or just ids for example? — yǝsʞǝla, Jan 19 '14 at 01:33

score 2 · Accepted Answer · edited May 23 '17 at 10:31

To find all these documents with at least one of the fields with a given value you can use this query:

db.aggr.find({$or: [
{ "feature_list.52227433abb03fa34b0000fa" : { $exists: true } },
{ "feature_list.52236117099c8924c500004a" : { $exists: true } },
{ "feature_list.52236236c71890c199000054" : { $exists: true } } 
]});

which finds your document:

{
    "_id" : ObjectId("5222769532fed3037d000049"),
    "cat_list" : {
        "52226a8932fed36c3000007e" : 1
    },
    "feature_list" : {
        "52227433abb03fa34b0000fa" : 0.2,
        "52236117099c8924c500004a" : 0.2,
        "52236236c71890c199000054" : 0.2,
        "522374d2842e497d2c00000c" : 0.2,
        "52237e3b842e493760000021" : 0.2
    },
    "title" : "blah"
}

But you can't really sort this the way you want because there is nothing sortable. I was thinking of sorting by all these searched fields and it would work if you search for only one field because missing field would have the lowest value. However, in this case with multiple fields the order of the fields in the sort matters and this trick does not work.

What also does not work is trying to sort by the whole subdocument, i.e .sort({"feature_list" : -1}) because it does not give any meaningful sorting.

So at this point the only way to get these results is to use aggregation framework, multiple queries (3 for example), or do it in your code.

I'm pretty sure I can write an aggregation framework query to do that, but first of all you'll be limited by the number of results you can fit in a single document size (32Mb). If you return ids only then it's a bit better. Second is that it might be actually not that efficient afterall. Lastly you might want to consider redesigning the schema because it's kind-of unconventional. It makes it hard and inefficient to query like that.

EDIT

A schema could be redesigned like this:

{ ...,
  feature_list: [
    {fid: "52227433abb03fa34b0000fa", weight: 0.2},
    {fid: "52236117099c8924c500004a", weight: 0.2}
  ]
}

This still does not help with sorting using regular query, however finding data will be much faster because Mongo does not use index for $exists, but will use index (if it exists) in this new schema where fields become data. There is no way to sort in Mongo by number of matched elements in an array.

You can get data using aggregation framework instead of a regular query in a very similar way to: In MongoDB search in an array and sort by number of matches The problem is that you can only get as many results as can fit in one document with aggregation framework. To make most out of it you can project only _id of the documents in the final steps.

Alternative to using aggregation framework would be to sort your data in code once you've found results. In either case schema redesign seems beneficial due to indexing.

What suggestions would have you have for a schema revision? I'm trying to model my data so that I can have a document that has associations to "feature" documents with an associative weight, like in a neural network. — Lehel, Jan 22 '14 at 23:49
Do you think that MapReduce would be a good way to go about sorting here? — Lehel, Jan 29 '14 at 21:19
It's hard to say. Maybe you can do it in the code, or with map reduce. Perhaps some tests would be required. Map reduce could be less involved but would limit result set size to 16Mb. — yǝsʞǝla, Jan 29 '14 at 22:23

Querying MongoDB for Elements Contained in an Array in Order of Number of Matching Elements

1 Answers1