2

lets say I have 1000 documents where each one has:

user_id
text

Now, I would like to pull all those documents but first pull the documents from a few specific users (given an array of user ids) and then all the rest.

I was thinking to use map reduce to create a new weight inline attribute if the user_id exists in the specific users array (using scope to pass the array) and then to sort that new attribute. But from what I could understand, you can not sort after map reduce.

Any one has a good suggestion how to pull this off? Any suggestion will be welcome.

Thanks!

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
bymannan
  • 1,353
  • 2
  • 13
  • 23

1 Answers1

4

Well there isn't a lot of detail here, but I can give a sample case for consideration. Consider the following set of documents:

{ "user" : "fred", "color" : "black" }
{ "user" : "bill", "color" : "blue" }
{ "user" : "ted", "color" : "red" }
{ "user" : "ted", "color" : "black" }
{ "user" : "fred", "color" : "blue" }
{ "user" : "bill", "color" : "red" }
{ "user" : "bill", "color" : "orange" }
{ "user" : "fred", "color" : "orange" }
{ "user" : "ted", "color" : "orange" }
{ "user" : "ally", "color" : "orange" }
{ "user" : "alice", "color" : "orange" }
{ "user" : "alice", "color" : "red" }
{ "user" : "bill", "color" : "purple" }

So suppose you want to bubble the items for the users "bill" and "ted" to the top of your results, then everything else sorted by the user and the color. What you can do is run the documents through a $project stage in aggregate, as follows:

db.bubble.aggregate([

    // Project selects the fields to show, and we add a weight value
    {$project: {
        _id: 0,
        "user": 1,
        "color": 1,
        "weight": {$cond:[
            {$or: [
                {$eq: ["$user","bill"]},
                {$eq: ["$user","ted"]}
            ]},
            1,
            0
         ]}
     }},

    // Then sort the results with the `weight` first, then `user` and `color`
    {$sort: { weight: -1, user: 1, color: 1 }}

])

So what that does is conditionally assign a value to weight based on whether the user was matched to one of the required values. Documents that do not match are simply given a 0 value.

When we move this modified document on to the $sort phase, the new weight key can be used to order the results so the "weighted" documents are on top, and anything else will then follow.

There a quite a few things you can do to $project a weight in this way. See the operator reference for more information:

http://docs.mongodb.org/manual/reference/operator/aggregation/

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • This looks like exactly what I needed! Thanks. Two follow up questions please: 1. Can I pass an array as parameter so that the $or: will use that array? 2. Can I return the whole document and not selectively choose attributes? – bymannan Feb 28 '14 at 10:16
  • @bymannan Yes the argument of $or is an array. If you are implementing in another language other than JavaScript, you can just pass the whole argument chain to a JSON parser to understand. It's all valid JSON, and I do it that way so it is easy to translate between language implementations. And yes the whole document is easy [see here](http://stackoverflow.com/questions/21721479/how-to-get-back-the-original-document-back-after-aggregation) – Neil Lunn Feb 28 '14 at 10:40
  • your answer worked great! Thank you!. As for the second question and your link, I might be missing something, but you are still explicitly writing down the documents' attributes one by one in the query. I am looking for a way to output all the attributes without project them one by one. Like "this" in map reduce. Should I just pass a JSON with all the attributes as well? – bymannan Mar 02 '14 at 11:17
  • @bymannan If I understand you correctly you are saying you don't want to **code** these conditions. My answer to that is in **real world** programming, you never **code** such a statement even as an *entire* aggregation pipeline. It's all just a **data structure** and JSON is just a way to represent that in a serialized form. Instead you **build** your structure just as what you want, then pass it in to your method. So you **would** *dynamically* build those conditions as well. Hope that clear this up. – Neil Lunn Mar 02 '14 at 11:33
  • Thanks, I understand. I just realized that aggregate has a 16mp limit for results and that it can not be combined with standard mongo operations - until 2.6. Can you think of a way to achieve the same result without aggregate? – bymannan Mar 02 '14 at 12:55
  • @bymannan If you really are hitting the BSON limit I'd be happy to chat. We can work out a time. **Devils advocate**: If you are **developing** 2.6RC1 is out now, easy to get from the repo's. Oh just realized you need more rep for chat. If you are hitting another problem on this, ask another question. Then I can up-vote that as well + you might get another answer. – Neil Lunn Mar 02 '14 at 13:55
  • Thanks for the help. I think I am ok with the limit, I've switched to only select the _ids. I've hit anther wall with this, I am trying to set different weights for different users but the weights are not present in the document. Taking your example, let's say you are relying on a lookup table like this: {"fred", 1},{"ted", 2},{"bill",3}...{"alice",10} Is this something Aggregation Pipeline can do? – bymannan Mar 04 '14 at 16:36