0

My mongo db collection contains the structure as :

{
    "_id" : ObjectId("5889ce0d2e9bfa938c49208d"),
    "filewise_word_freq" : {
            "33236365" : [
                    [
                            "cluster",
                            4
                    ],
                    [
                            "question",
                            2
                    ],
                    [
                            "differ",
                            2
                    ],[
                            "come",
                            1
                    ]
            ],
            "33204685" : [
                    [
                            "node",
                            6
                    ],
                    [
                            "space",
                            4
                    ],
                    [
                            "would",
                            3
                    ],[
                            "templat",
                            1
                    ]
            ]
    },
    "file_root" : "socialcast",
    "main_cluster_name" : "node",
    "most_common_words" : [
            [
                    "node",
                    16
            ],
            [
                    "cluster",
                    7
            ],
                [
                        "n't",
                        3
                ]
        ]
}

I want to search for a value "node" inside the arrays of arrays of the filename (in my case its "33236365","33204685" and so on...) of the dict filewise_word_freq. And if the value("node") is present inside any one of the array of arrays of the filename(33204685), then should return the filename(33204685).

I tried from this link of stackoverflow : enter link description here

I tried to execute for my use case it didn't work. And above all this I didn't no how to return only the filename rather the entire object or document.

db.frequencydist.find({"file_root":'socialcast',"main_cluster_name":"node","filewise_word_freq":{$elemMatch:{$elemMatch:{$elemMatch:{$in:["node"]}}}}}).pretty().

It returned nothing. Kindly help me.

Community
  • 1
  • 1
Nitesh kumar
  • 348
  • 2
  • 8
  • 25

2 Answers2

1

You can try something like this. This will match the node as part of the query and returns filewise_word_freq.33204685 as part of the projection.

db.collection.find({
    "file_root": 'socialcast',
    "main_cluster_name": "node",
    "filewise_word_freq.33204685": {
        $elemMatch: {
            $elemMatch: {
                $in: ["node"]
            }
        }
    }
}, {
    "filewise_word_freq.33204685": 1
}).pretty();
s7vr
  • 73,656
  • 11
  • 106
  • 127
  • Ya I can try like what u said @veeram, But unforunately the model that presently there I can't predict the filename (ie., 33204685, etc...) this might be anything and there are 1000's of filename like that. I have just shown only 2 to fill into the space. So all i can get is "filewise_word_freq" and inside that is dynamic. – Nitesh kumar Jan 31 '17 at 05:29
1

the data model you have chosen has made it extremely difficult to either query or even for aggregation. I would suggest to revise your document model. However I think you can use $where

db.collection.find({"file_root": 'socialcast',
    "main_cluster_name": "node", $where : "for(var i in this.filewise_word_freq){for(var j in this.filewise_word_freq[i]){if(this.filewise_word_freq[i][j].indexOf("node")>=0){return true}}}"})

yes, this will return you the whole document and from your application you might need to filter the files name out.

you might also want to see map-reduce functionality, though that's not recommended.

One other way is to do it through functions, functions runs on mongo server and are saved in a special collection.

Still going back to the db model, do revise it if that's a possibility. maybe something like

{
    "_id" : ObjectId("5889ce0d2e9bfa938c49208d"),
    "filewise_word_freq" : [
              {
                    "fileName":"33236365",
                    "word_counts" : {
                       "cluster":4,
                       "question":2,
                       "differ":2,
                       "come":1
                    }
            },
            {
                    "fileName":"33204685",
                    "word_counts" : {
                       "node":6,
                       "space":4,
                       "would":3,
                       "template":1
                    }
            }
           ] 
    "file_root" : "socialcast",
    "main_cluster_name" : "node",
    "most_common_words" : [
            {
                    "node":16
            },
            {
                    "cluster":7
            },
                {
                        "n't":3
                }
        ]
}

It would be a lot easier to run aggregation on these.

For this model, the aggregation would be something like

db.collection.aggregate([
 {$unwind : "$filewise_word_freq"},
 {$match : {'filewise_word_freq.word_counts.node' : {$gte : 0}}},
 {$group :{_id: 1, fileNames : {$addToSet : "$filewise_word_freq.fileName"}}},
 {$project :{ _id:0}}
 ])

this will provide you a single document with a single field fileNames with list of all the filename

{
  fileNames : ["33204685"]
}
Rahul Kumar
  • 2,781
  • 1
  • 21
  • 29
  • I think if i change the model now there will be lot of rework that needs to be done. However thank you for your valuable suggestion. I will try to change the model once I get some bandwidth in time. Say if i change the model as you say like the above, now how do query to get only the filenames if the word is present in the word_counts. Note: the word might be present in multiple files also. – Nitesh kumar Jan 31 '17 at 05:34
  • @Niteshkumar I added the aggregation for example db models. Thing is mongodb doesn't work great with nested arrays. So any db model you design, you might want to keep it as some combination of array and objects or objects and objects. Then it would be easier to run aggregate, query and update operations. – Rahul Kumar Jan 31 '17 at 06:36
  • Thanks a lot for your help. I will definitely suggestions and change the model soon as possible. I think I also need to do some more r@d mongodb usage and functionalities before I attempt to change the existing model. – Nitesh kumar Jan 31 '17 at 06:55