Well, I want to create some sort of MapReduce algorithm for creating Inverse index for text documents. In mapping part, I do something like this
letters = ['a']
regx = re.compile("^("+"|".join(letters)+')')
selectedWords = directIndex.aggregate([
{ "$match": { "words.word": regx } },
{ "$unwind": "$words" },
{ "$match": { "words.word": regx } },
{ "$group": { "_id": { "word":"$words.word", "count":"$words.count", 'document' : '$document' } } }])
Well, here, I am selecting all words and information related to them by first letter. After this, i write this information to another collection:
myinvcol.insert_one({'letter':str(''.join(letters)),'words':selectedWords })
In the next step I am reading each inserted document and performing the reduce operation dict('wordName':{documents:[document1:count1, document2:count2, etc], 'wordName2:{documents:[...]}') and make some additional operations on this dict
Now, the fun part)) : It is possible to do the first step(map part) aka aggregation to execute totally on MongoDB server? In other words, i know that there is '$out' operator:
letters = ['a']
regx = re.compile("^("+"|".join(letters)+')')
selectedWords = directIndex.aggregate([
{ "$match": { "words.word": regx } },
{ "$unwind": "$words" },
{ "$match": { "words.word": regx } },
{ "$group": { "_id": { "word":"$words.word", "count":"$words.count", 'document' : '$document' } } }
{ "$out" : 'InverseIndex'}])
It allows me to write result of aggregate to another collection, but it doesn't do what i want: instead of inserting one document :
{'letter':str(''.join(letters)),'words':selectedWords },
i got many insertions of
{ "_id": { "word":"$words.word", "count":"$words.count", 'document' : '$document' } }.
So, to end with, is there a way to create a document in aggregation that merges all its results in one array before the $out statement?