i have 2 collections : words and phrases Each word document has an array of phrases id's. And each phrase can be active or inactive.
For example :
words :
{"word" => "hello", phrases => [1,2]}
{"word" => "table", phrases => [2]}
phrases :
{"id" => 1, "phrase" => "hello world!", "active" => 1}
{"id" => 2, "phrase" => "hello, i have already bought new table", "active" => 0}
I need to get count of active phrases for each word.
In php i do it like this:
1. get all words
2. for each word get count of active phrases with condition ['active' => 1]
Question: How can i get words with active phrases count in one request? I tried to use MapReduce, but i need to make a request for each word to get count of active phrases.
UPD:
In my test collection there are 92 000 phrases and 23 000 words.
I have already tested both variant: with php loop for each word in which i get phrases count and aggreagation function in mongo.
But i changed aggregation pipeline in commets below because of phrases_data. It is array, so i can't use $match on it. I use $unwind after $lookup.
[ '$unwind' => '$5'],
[
'$lookup' => [
'from' => 'phrases_926ee3bc9fa72b029e028ec90e282072ea0721d1',
'localField' => '5',
'foreignField' => '0',
'as' => 'phrases_data'
]
],
[ '$unwind' => '$phrases_data'],
[ '$match' => [ 'phrases_data.3' => 77] ], //phrases_data.3 => 77 it is similar to phrases_data.active => 1
[ '$group' =>
[
'_id' => ['word' => '$1', 'id' => '$0'],
'active_count' => [ '$sum' => 1]
]
],
[ '$match' => [ 'active_count' => ['$gt' => 0]] ],
[ '$sort' =>
[
'active_count' => -1
]
]
The problem is that $group command take 80% of process time. And it is much slower than php loop. Here is my results for test collection:
1. Php loop (get words-> get phrases count for each word): 10 seconds
2. Aggregation function : 20 seconds