This is a continuation of this question.
I'm using the following code to find all documents from collection C_a
whose text contains the word StackOverflow
and store them in another collection called C_b
:
import pymongo
from pymongo import MongoClient
client = MongoClient('127.0.0.1') # mongodb running locally
dbRead = client['C_a'] # using the test database in mongo
# create the pipeline required
pipeline = [{"$match": {"$text": {"$search":"StackOverflow"}}},{"$out":"C_b"}] # all attribute and operator need to quoted in pymongo
dbRead.C_a.aggregate(pipeline) #execution
print (dbRead.C_b.count()) ## verify count of the new collection
This works great, however, if I run the same snippet for multiple keywords the results get overwritten. For example I want the collection C_b
to contain all documents that contain the keywords StackOverflow
, StackExchange
, and Programming
. To do so I simply iterate the snippet using the above keywords. But unfortunately, each iteration overwrites the previous.
Question: How do I update the output collection instead of overwriting it?
Plus: Is there a clever way to avoid duplicates, or do I have to check for duplicates afterwards?