0

OK, so case scenario is I have a collection of Product Categories, with each document similar to:

Category {
    _id => ...,
    name => ...,
    products => array(
        0 => new MongoID(PID...1),
        1 => new MongoID(PID...2),
        2 => new MongoID(PID...3),
        ....
    )
}

And a collection of Products:

Product {
    _id => ...,
    name => ...,
    active => true,
    status => 'published',
    categories => array(
        0 => new MongoID(CATID...1),
        1 => new MongoID(CATID...2),
        2 => new MongoID(CATID...3)
    )
}

I am refactoring my code because at the moment because when I do any limiting / offsetting, I have to return all products in a category as objects, then unset them if they aren't active (there are a couple more content matches, but I'm leaving them out for brevity). Obviously, we're far from optimised at this stage.

I have built a conditional query that looks like this:

array(
    '$and' => array(
        array(
            '$or' => array(
                    0 => array(
                            '_id' => new MongoID(PID...1)
                    ),
                    1 => array(
                            '_id' => new MongoID(PID...2)
                    ),
                    2 => array(
                            '_id' => new MongoID(PID...3)
                    ),
                    ...

            )
        ),
        array(
            '$and' => array(
                array(
                    'active' => true
                ),
                array(
                    'status' => 'published'
                )
            )
        )
    )
)

What I need to do is respect the order of the _id's in the first part of the query and offset / limit based on which documents match the second part in the sequence of the _id's.

The query ultimately ends up in a find() command but setting a sort order obviously won't work, because we're not sorting.

Short of splitting some fields from the Product and adding to Category->products[] I'm struggling to find a way to do this. Is it simply a case of handling it with MapReduce /aggregation now or is there a simpler alternative?

Thanks!

Andrew Waters
  • 73
  • 1
  • 7

1 Answers1

2

Where you need to maintain the "order" in this way, then your best option is to assign a "weight" to the items you are selecting. That gives you something to sort.

You can do this using .aggregate(), and you could arguably do much the same thing with mapReuce, but the "aggregate" way should run faster. Also you syntax here seems a little off. Rather than use $or over the same field you probably want $in.

In general JavaScript/JSON form:

var idArray = [ 5, 2, 8 ];

db.collection.aggregate([

    // Match the selected documents by "_id"
    { "$match": {
        "_id": { "$in": [ 5, 2, 8 ] },
        "active": true,
        "status": "published"
    },

    // Project a "weight" to each document
    { "$project": {
        "name": 1,
        "active": 1,
        "status": 1,
        "weight": { "$cond": [
            { "$eq": [ "_id", 5  ] },
            1,
            { "$cond": [
                { "$eq": [ "_id", 2 ] },
                2,
                3
            ]}
        ]}
    }},

    // Sort the results
    { "$sort": { "weight": 1 } }

])

So I did "expand" out the array there for purposes of reading this but your actual code you just want to reference the array for the $in clause.

The nested use of $cond evaluated the logical condition to match the "value" of the _id field and assigns the "weight" as in increasing number. That maintains the order of the values in your input array.

Actually you would do this in code to "generate" the pipeline parts, especially the nested conditions with something along the lines of what is seen here. That example is using "specific" weights but much of the principles are the same.

But that will allow you to maintain the order of your input array


PHP Code

For example, something like this to generate the required pipeline:

$list = array( 5, 2, 8 );

$stack = array();

for( $i = count($list)-1; $i > 0; $i-- ) {

  $rec = array(
    '$cond' => array(
      array( '$eq' =>
        array( '$_id', $list[$i-1] )
      ),
      $i
    )
  );

  if ( count($stack) == 0 ) {
    $rec['$cond'][] = $i+1;
  } else {
    $last = array_pop($stack);
    $rec['$cond'][] = $last;
  }

  $stack[] = $rec;

}

$pipeline = array(
  array(
    '$match' => array(
      '_id' => array( '$in' => $list ),
      'active' => true,
      'status' => "published"
    )
  ),
  array(
    '$project' => array(
      'name'    => 1,
      'active'  => 1,
      'status'  => 1,
      'weight'  => $stack[0]
    )
  ),
  array(
    '$sort' => array( 'weight' => 1 )
  )
);

echo json_encode( $pipeline, JSON_PRETTY_PRINT ) . "\n";

Of course you do not "encode" to JSON in reality, this is just to show the structure that is formed.


Map Reduce

And just to show you can do the same sort of thing with mapReduce as well. First set the mapper

var mapper = function () {
    var order = inputs.indexOf(this._id);
    emit( order, { doc: this } );
}

And maybe a finalize function to clean things up, "a little":

var finalize = function (key, value) {
    return value.doc;
}

The run the mapReduce. A reducer is not required:

db.test.mapReduce(
    mapper,
    function(){},
    { 
        "out": { "inline": 1 },
        "query": { "_id": { "$in": idArray } },
        "scope": { "inputs": idArray } ,
        "finalize": finalize
    }
)

Which "looks" cleaner, but probably won't run as fast and has a very "mapReduce" type of output for the result.

Community
  • 1
  • 1
Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • Neil, thanks for this - the aggregation works great, with a seek speed matching what I was expecting to see. My only issue now is adding $limit and $skip to it - $limit works fine, but when I add a value for $skip I get no results back... – Andrew Waters Mar 26 '14 at 11:56
  • Actually, scrap that - it appears the order of $limit and $skip impacts the result set from aggregate() - switching them around has allowed me to paginate properly :) – Andrew Waters Mar 26 '14 at 11:59
  • @AndrewWaters That does not make a lot of sense in the context of the question you are asking. If anything you need to **limit** the `_id` values you are passing to the `$in`. So **slice** that array in code. – Neil Lunn Mar 26 '14 at 12:01
  • I would prefer to do that, but the problem is the slice itself would depend on the output of the $match – Andrew Waters Mar 26 '14 at 12:24
  • @AndrewWaters if this point still has you "stuck", then perhaps ask **another** question that is specific to that point. But **this** should help you with your "maintaining order" principle. – Neil Lunn Mar 26 '14 at 12:33
  • No, it's resolved - I was adding it as a footnote - thanks for your help Neil, I appreciate the comprehensive response – Andrew Waters Mar 26 '14 at 13:11