Filtering and Matching Arrays within Arrays

Question

I am looking for querying JSON file which has nested array structure. Each design element has multiple SLID and status. I want to write mongodb query to get designs with highest SLID and status as "OLD". Here is the sample JSON:

{
  "_id" : ObjectId("55cddc30f1a3c59ca1e88f30"),
  "designs" : [
    {
      "Deid" : 1,
      "details" : [
        {
          "SLID" : 1,
          "status" : "OLD"
        },
        {
          "SLID" : 2,
          "status" : "NEW"
        }
      ]
    },
    {
      "Deid" : 2,
      "details" : [
        {
          "SLID" : 1,
          "status" : "NEW"
        },
        {
          "SLID" : 2,
          "status" : "NEW"
        },
        {
          "SLID" : 3,
          "status" : "OLD"
        }
      ]
    }
  ]
}

In this sample the expected query should return the following as SLID is highest with status "OLD".

{
    "_id" : ObjectId("55cddc30f1a3c59ca1e88f30"),
    "designs" : [
        {
            "Deid" : 2,
            "details" : [
                {
                    "SLID" : 3,
                    "status" : "OLD"
                }
            ]
        }
    ]
}

I have tried following query but it kept returning other details array element (which has status "NEW") along with above element.

db.Collection.find({"designs": {$all: [{$elemMatch: {"details.status": "OLD"}}]}},
 {"designs.details":{$slice:-1}})

Edit: To summarize the problem: Requirement is to get all design from document set with highest SLID (always the last item in details array) if it has status as "OLD".

Still cannot fix the problem. I tried this db.Collection.find({"designs.details.status": "OLD"}, {_id: 0, 'designs.details.$': 1}) It is showing design with NEW. — AKJ, Aug 14 '15 at 17:26

score 0 · Answer 1 · edited May 23 '17 at 10:32

Present Problem

What you should have been picking up from the previously linked question is that the positional $ operator itself is only capable of matching the first matched element within an array. When you have nested arrays like you do, then this means "always" the "outer" array can only be reported and never the actual matched position within the inner array nor any more than a single match.

Other examples show usage of the aggregation framework for MongoDB in order to "filter" elements from the array by generally processing with $unwind and then using conditions to match the array elements that you require. This is generally what you need to do in this case to get matches from your "inner" array. While there have been improvements since the first answers, your "last match" or effectively a "slice" condition, excludes other present possibilities. Therefore:

db.junk.aggregate([
  { "$match": {
    "designs.details.status": "OLD"
  }},
  { "$unwind": "$designs" },
  { "$unwind": "$designs.details" },
  { "$group": {
    "_id": {
      "_id": "$_id",
      "Deid": "$designs.Deid"
    },
    "details": { "$last": "$designs.details"}
  }},
  { "$match": {
    "details.status": "OLD"
  }},
  { "$group": {
    "_id": "$_id",
    "details": { "$push": "$details"}
  }},
  { "$group": {
    "_id": "$_id._id",
    "designs": { 
      "$push": {
        "Deid": "$_id.Deid",
        "details": "$details"
      }
    }
  }}
])

Which would return on your document or any others like it a result like:

{
    "_id" : ObjectId("55cddc30f1a3c59ca1e88f30"),
    "designs" : [
        {
            "Deid" : 2,
            "details" : [
                {
                    "SLID" : 3,
                    "status" : "OLD"
                }
            ]
        }
    ]
}

The key there being to $unwind both arrays and then $group back on the relevant unique elements in order to "slice" the $last element from each "inner" array.

The next of your conditions requires a $match in order to see that the "status" field of that element is the value that you want. Then of course since the documents have been essentially "de-normalized" by the $unwind operations and even with the subsequent $group, the following $group statements re-construct the document into it's original form.

Aggregation pipelines can either be quite simple or quite difficult depending on what you want to do, and reconstruction of documents with filtering like this means you need to take care in the steps, particularly if there are other fields involved. As you should also appreciate here, this process of $unwind to de-normalize and $group operations is not very efficient, and can cause significant overhead depending on the number of possible documents that can be met by the initial $match query.

Better Solution

While currently only available in the present development branch, there are some new operators available to the aggregation pipeline that make this much more efficient, and effectively "on par" with the performance of a general query. These are notably the $filter and $slice operators, which can be employed in this case as follows:

db.junk.aggregate([
  { "$match": {
    "designs.details.status": "OLD"
  }},
  { "$redact": {
    "$cond": [
      { "$gt": [
        { "$size":{
          "$filter": {
            "input": "$designs",
            "as": "designs",
            "cond": {
              "$anyElementTrue":[
                { "$map": {
                  "input": { 
                    "$slice": [
                      "$$designs.details",
                      -1
                    ]
                  },
                  "as": "details",
                  "in": {
                    "$eq": [ "$$details.status", "OLD" ]
                  }
                }}
              ]
            }
          }
        }},
        0
      ]},
      "$$KEEP",
      "$$PRUNE"
    ]
  }},
  { "$project": {
    "designs": {
      "$map": {
        "input": {
          "$filter": {
            "input": "$designs",
            "as": "designs",
            "cond": {
              "$anyElementTrue":[
                { "$map": {
                  "input": { 
                    "$slice": [
                      "$$designs.details",
                      -1
                    ]
                  },
                  "as": "details",
                  "in": {
                    "$eq": [ "$$details.status", "OLD" ]
                  }
                }}
              ]
            }
          }
        },
        "as": "designs",
        "in": {
          "Deid": "$$designs.Deid",
          "details": { "$slice": [ "$$designs.details", -1] }
        }
      }
    }
  }}
])

This effectively makes the operations just a $match and $project stage only, which is basically what is done with a general .find() operation. The only real addition here is a $redact stage, which allows the documents to be additionally filtered from the initial query condition by futher logical conditions that can inspect the document.

In this case, we can see if the document not only contains an "OLD" status, but also that this is the last element of at least one of the inner arrays matches that status it it's own last entry, otherwise it is "pruned" from the results for not meeting that condition.

In both the $redact and $project, the $slice operator is used to get the last entry from the "details" array within the "designs" array. In the initial case it is applied with $filter to remove any elements where the condition did not match from the "outer" or "designs" array, and then later in the $project to just show the last element from the "designs" array in final presentation. That last "reshape" is done by $map to replace the whole arrays with the last element slice only.

Whilst the logic there seems much more long winded than the initial statement, the performance gain is potentially "huge" due to being able to treat each document as a "unit" without the need to denormalize or otherwise de-construct until the final projection is made.

Best Solution for Now

In summary, the current processes you can use to achieve the result are simply not efficient for solving the problem. It would in fact be more efficient to simply match the documents that meet the basic condition ( contain a "status" that is "OLD" ) in conjuntction with a $where condition to test the last element of each array. However the actual "filtering" of the arrays in output is best left to client code:

db.junk.find({ 
  "designs.details.status": "OLD",
  "$where": function() {
    return this.designs.some(function(design){
      return design.details.slice(-1)[0].status == "OLD";
    });
  }
}).forEach(function(doc){
  doc.designs = doc.designs.filter(function(design) {
    return design.details.slice(-1)[0].status == "OLD";
  }).map(function(design) {
    design.details = design.details.slice(-1);
    return design;
  });
  printjson(doc);
});

So the query condition at least only returns the documents that match all conditions, and then the client side code filters out the content from arrays by testing the last elements and then just slicing out that final element as the content to display.

Right now, that is probably the most efficient way to do this as it mirrors the future aggregation operation capabilties.

The problems are really in the structure of your data. While it may suit your display purposes of your application, the usage of nested arrays makes this notoriously difficult to query as well as "impossible" to atomically update due to the limitations of the positional operator mentioned before.

While you can always $push new elements to the inner array by matching it's existence within or just the presence of the outer array element, what you cannot do is alter that "status" of an inner element in an atomic operation. In order to modify in such a way, you need to retrieve the entire document, then modifiy the contents in code, and save back the result.

The problems with that process mean you are likely to "collide" on updates and possibly overwrite the changes made by another concurrent request to the one you are currently processing.

For these reasons you really should reconsider all of your design goals for the application and the suitability to such a structure. Keeping a more denormalized form may cost you in some other areas, but it sure would make things much more simple to both query and update with the kind of inspection level this seems to require.

The end conclusion here should be that you reconsider the design. Though getting your results is both possible now and in the future, the other operational blockers should be enough to warrant a change.

Filtering and Matching Arrays within Arrays

1 Answers1

Present Problem

Better Solution

Best Solution for Now