0

I would like to query the sub-key of an array, extract keywords, and count them.

The words that I'd like to extract from are in "Title." When I applied the query filter, it shows that the field I need to query is Reviews.0.Title. But I have at least 200 elements inside the Reviews array.

How do I do this?

{ 
    "_id" : ObjectId("561c3ccc4c97f053753f1a78"), 
    "Reviews" : 
    [
        {
        "Ratings" : {
                 "Service" : "4", 
                 "Overall" : "5"
                    }, 
        "Location" : "MIS", 
        "Title" : "“Excellent and great”", 
        "Author" : "JDoe", 
        "ReviewID" : "1", 
        "Date" : "March 30, 2015"
        }, 
   {
    "Ratings" : {
                  "Service" : "4", 
                  "Overall" : "5"  
                 }, 
    "Location" : "WIS", 
    "Title" : "“Excellent and fantastic!”", 
    "Author" : "John Doe", 
    "ReviewID" : "2",  
    "Date" : "March 27, 2016"
   }
    ],

    "Info" : 
    {
    "Name" : "AA",
    "ID" : "0001"
     }
}

{ 
    "_id" : ObjectId("561c3ccc4c97f0ytu7289074"), 
    "Reviews" : 
    [
        {
         "Ratings" : {
                 "Service" : "4", 
                 "Overall" : "5"
                    }, 
         "Location" : "VEG", 
         "Title" : "“Not too bad”", 
         "Author" : "JDoe", 
         "ReviewID" : "3", 
         "Date" : "March 30, 2015"
        }, 
       {
        "Ratings" : {
                  "Service" : "4", 
                  "Overall" : "5"  
                 }, 
        "Location" : "NEV", 
        "Title" : "“Outstanding service”", 
        "Author" : "John Doe", 
        "ReviewID" : "4",  
        "Date" : "March 27, 2016"
       }
    ],

    "Info" : 
    {
     "Name" : "BB",
     "ID" : "0002"
    }

}

I would like to get the following output:

{ "_id" : "Excellent", "value" : 1 }
{ "_id" : "Great", "value" : 1 }
{ "_id" : "Location", "value" : 2 }

edited output with name:

{ "Name" : AA: "Excellent", "value" : 1 "Great", "value" : 1 }
{ "Name" : BB: "Great", "value" : 1 }
S. Ray
  • 13
  • 5
  • Can you [edit] your question to include the current query filter with your final expected output from the above document, for example? – chridam Oct 19 '16 at 07:29
  • @chridam i've edited the question to include the final output. i tried using map-reduce, but can't get the output. – S. Ray Oct 19 '16 at 07:37

1 Answers1

0

I managed to find a mapReduce solution that provides a functionality similar to what you are looking for here

I adjusted that solution and generated the following mapReduce query:

db.reviews.mapReduce(
// Map function
function () {
    if(this.Reviews) {
        this.Reviews.forEach(function(review) {
            if(review["Title"]) {
                // Remove all punctuation from the review title
                var titleNoPunctuation = review["Title"].replace(/[^\w\s?.]/g, "");

                // Transform all the review titles to lower case and split them into word tokens
                var titleTokens = titleNoPunctuation.toLowerCase().split(" ");

                titleTokens.forEach(function(word) {
                    emit(word, 1);
                });

            }
        });
    }
},

// Reduce function
function (key, values) {
    var counter = 0;

    values.forEach(function(value) {
        counter += value;
    });

    return counter;
},

// Output collection 
{ out: "word_count"});

The above query stores its results in a different collection, as specified in the out attribute, in this example word_count.

Thus, in order to fetch the result set returned by the mapReduce job, you need to run the following query on the out collection (word_count):

> db.word_count.find();
{ "_id" : "excellent", "value" : 1 }
{ "_id" : "great", "value" : 1 }
{ "_id" : "location", "value" : 2 }
Community
  • 1
  • 1
vladzam
  • 5,462
  • 6
  • 30
  • 36
  • thanks Vlad! that works really well. I am using MongoChef, and copied the functions for map and reduce separately. – S. Ray Oct 19 '16 at 17:04
  • can i run the entire script using intellishell? i tried, but the output is different. – S. Ray Oct 19 '16 at 17:06
  • I ran the query directly from the mongo shell. What is the output in your `word_count` collection? And what version of MongoDB are you running? – vladzam Oct 19 '16 at 18:37
  • i am using the mongochef 3.3 core and mongod version: 3.2.10 (MMAPv1) – S. Ray Oct 20 '16 at 16:53
  • the output i get is this in JSON: – S. Ray Oct 20 '16 at 16:55
  • The output is this: { "result" : "word_count", "timeMillis" : NumberInt(495), "counts" : { "input" : NumberInt(10), "emit" : NumberInt(4065), "reduce" : NumberInt(373), "output" : NumberInt(975) }, "ok" : NumberInt(1) } – S. Ray Oct 20 '16 at 16:55
  • Yes, that means that the actual results have been placed in a different collection, `word_count`. To see the actual results, you now need to run `db.word_count.find()` and you will have the actual results you are looking for. – vladzam Oct 20 '16 at 18:19
  • thanks Vlad, now i understand. How do i change the mapreduce to get the name of the key that has the most number of these words? i updated the question to include the collection (it has a name key). – S. Ray Oct 21 '16 at 03:31
  • @S.Ray That is the standard format that `mapReduce` jobs use when producing a result set. If you want to change the standard output format, you can modify the structure of the `mapReduce` result set. Here you can read more about this process: http://stackoverflow.com/questions/8416262/how-to-change-the-structure-of-mongodbs-map-reduce-results – vladzam Oct 21 '16 at 08:06
  • Z Thanks, I'll try and change the output. Thanks a lot for your help! – S. Ray Oct 22 '16 at 19:26