0

I am working on querying a Nested Json in the mongo db,the sample data structure is shown below:

{
    "_id" : ObjectId("5bf159cc6bf6ab0ac374f80c"),
    "name" : "Jack",
    "age" : "30",
    "info" : {
        "0" : {
            "status" : "true",
            "name" : "luffy"
        },
        "1" : {
            "status" : "true",
            "name" : "sanji"
        },
        "2" : {
            "status" : "false",
            "name" : "zoro"
        }
    }
}

/* 2 */
{
    "_id" : ObjectId("5bf15f286bf6ab0ac374f8ed"),
    "name" : "Mack",
    "age" : "33",
    "info" : {
        "0" : {
            "status" : "true",
            "name" : "naruto"
        },
        "1" : {
            "status" : "true",
            "name" : "sakura"
        },
        "2" : {
            "status" : "false",
            "name" : "sasuke"

Now what I wanted to do is query and fetch those results where status = 'true'.After some googling I came to know how to query a nested document and came up with a sample.

query:db.getCollection('test').find({"info.0.status":"true"})

But as you know from the above query, the query will only fetch the appropriate results from the '0th' array,How do I get the query to iterate through the arrays and return documents with "status":"true" .Also I am new to Mongodb, kindly ignore any mistakes.

Note: One of the users told me I Should remodel my data structure as shown below and then use $filter operator:

[
  {
    "_id": ObjectId("5bf159cc6bf6ab0ac374f80c"),
    "name": "Jack",
    "age": "30",
    "info": [
      {
        "status": "true",
        "name": "luffy"
      },
      {
        "status": "true",
        "name": "sanji"
      },
      {
        "status": "false",
        "name": "zoro"
      }
    ]
  },
  {
    "_id": ObjectId("5bf15f286bf6ab0ac374f8ed"),
    "name": "Mack",
    "age": "33",
    "info": [
      {
        "status": "true",
        "name": "naruto"
      },
      {
        "status": "true",
        "name": "sakura"
      },
      {
        "status": "false",
        "name": "sasuke"
      }
    ]
  }
]

However I am not getting how to remodel my structure in the way that the user has shown.Is there any other tactic I could use?

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Austin Joy
  • 59
  • 1
  • 12
  • 1
    *"I am not getting how to remodel my structure...."* - What exactly don't you "get"? You need to be specific about what you do not understand and what the actual question is. *"Is there any other tactic..."* - No, not really. Not without sacrificing performance, which is exactly why it's recommended you restructure. So what is the question? Or do we just point you to the existing answers that duplicate what you are presently showing? – Neil Lunn Nov 19 '18 at 01:41
  • I just started with mongodb a few days back, I don't understand how do I remodel my structure(with a direct query if possible?).If there is no other feasible technique,could you help me with the remodelling of the data structure. – Austin Joy Nov 19 '18 at 01:44

1 Answers1

1

The bottom line to the general question is DON'T attempt to "query" in this form at all, but rather take the advice and rewrite the data. But there are the different approaches for the record.

Rewrite the Collection

As you were told already, it's best to remodel the collection so these are real "arrays" instead of "objects" with named keys.

The general cases are actually either "iterate" the collection items and rewrite them:

var updates = [];
db.getCollection('test').find().forEach(doc => {
  var info = Object.keys(doc.info).map(k => 
    Object.assign({}, doc.info[k], { status: doc.info[k].status === "true" }) );

  updates.push({
    "updateOne": {
      "filter": { "_id": doc._id },
      "update": { "$set": { "doc.info": info } }
    }
  });

  if (updates.length >= 1000) {
    db.getCollection('test').bulkWrite(updates);
    updates = [];
  }

});

if (updates.length >= 0) {
  db.getCollection('test').bulkWrite(updates);
  updates = [];
}

Or to write a completely new collection:

db.getCollection('test').aggregate([
  { "$addFields": {
    "info": {
      "$map": {
        "input": { "$objectToArray": "$info" },
        "in": {
          "$mergeObjects": [
            "$$this.v",
            { "status": { "$toBool": "$$this.v.status" }
          ]
        }
      }
    }
  }},
  { "$out": "newtest" }
])

The basic reliance being if you can tolerate a "new collection" and actually have features like $objectToArray available to your MongoDB version.

Even when "updating" it would generally be recommended (especially for production) that you instead take an approach from How to Update Multiple Array Elements in mongodb, since using $set to replace the entire property is "brute force" and not safe for production. Fine though on just your own system for testing.

After either of those forms are complete then you can basically $filter for only the matches which are true, as in:

collection.aggregate([
  // Finds valid "documents" 
  { "$match": { "info.status": true } },
  // "filters" the array content
  { "$addFields": {
    "info": {
      "$filter": { "input": "$info", "cond": "$$this.status" }
    }
  }}
])

Query In Place

Of course you "can" actually query with the present document structure, but it's simply not recommended:

collection.aggregate([
  // Match on transformed object
  { "$match": {
    "$expr": {
      "$gt": [
        { "$size": {
          "$filter": {
            "input": {
              "$map": {
                "input": { "$objectToArray": "$info" },
                "in": "$$this.v"
              }
            },
            "cond": { "$toBool": "$$this.status" }
          }
        }},
        0
      ]
    }
  }},
  // Transform remaining objects
  { "$addFields": {
    "info": {
      "$filter": {
        "input": {
          "$map": {
            "input": { "$objectToArray": "$info" },
            "in": "$$this.v"
          }
        },
        "cond": { "$toBool": "$$this.status" }
      }
    }
  }}
])

Or even with a JavaScript expression in $where, without of course support for actually "filtering" the result content before retrieval from the server:

collection.find({
  "$where": function() {
    return Object.keys(this.info).map( k => this.info[k])
      .some(e => e.status === "true")
  }
})

The only thing that alters documents with JavaScript on the server is mapReduce, with it's own specific format of course:

collection.mapReduce(
  function() {
    var id = this._id;
    delete this._id;
    this.info = Object.keys(this.info)
      .map(k => this.info[k])
      .filter(o => o.status === "true")
    emit(id,this);
  },
  function() {},
  {
    "out": { "inline": 1 },
    "query": {
      "$where": function() {
        return Object.keys(this.info).map( k => this.info[k])
          .some(e => e.status === "true")
      }
    }
  }
)

In either case these are really "horrible" since they rely essentially on transforming each document into the actual "array" form before the conditions to match can be applied. Rewriting the collection on the other hand actually allows that work to be done beforehand, thus removing such a computation and also allowing an "index" to be specified in order to speed up real world query results.


In short, Rewrite It, and don't "query" it how it presently is, since databases are not really optimized for "named keys" when querying across documents.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • Thank you,will definitely try rewriting the collection and let you know.Also where could I get a better understanding of Mongodb functions,something in more detail than the mongodb docs? – Austin Joy Nov 19 '18 at 02:27
  • 1
    @AustinJoy Well apart from the linked documentation samples which are not actually "completely bad", you could always try reading many answers already given on this very site. Kind of why we answer questions here ya know :) – Neil Lunn Nov 19 '18 at 02:29
  • Yea,will look into,When i try to rewrite the colelction, I am getting the error "_id" is not defined, is there some silly thing I'm missing? – Austin Joy Nov 19 '18 at 02:43
  • @AustinJoy My Bad. "Typo". Should have been `doc._id` and not just `_id`. See changes. Also note the conversion between values like `"true"` as a "string" to an actual boolean `true`, which I missed in haste earlier as well. – Neil Lunn Nov 19 '18 at 02:49
  • Thank you for taking the time and helping me understand :) – Austin Joy Nov 19 '18 at 03:00