I got the following errors. when I tried to aggregated it by user_id or distinct on user_id
failed: exception: aggregation result exceeds maximum document size
failed: exception: distinct too big, 16mb cap
I wonder know how to finish my tasks under very large collection ?
data format
{
user_id: "Jack",
SYMPTOM_1: "flu",
SYMPTOM_2: "cough",
SYMPTOM_3: "cancer",
datetime: "20140101",
}
aggregation query
This query is tried to group users and append all the symptoms of medical records to each user
db.medical_records.aggregate([
{
"$sort": { "datetime": 1 }
},
{
"$group": {
"_id": "$user_id",
"symptom1":{
"$push": {"symptom": "$SYMPTOM_1" ,"date": "$datetime"}
},
"symptom2":{
"$push": {"symptom": "$SYMPTOM_2" ,"date": "$datetime"}
},
"symptom3":{
"$push": {"symptom": "$SYMPTOM_3" ,"date": "$datetime"}
},
"first_date": { "$first": "$datetime" },
"user_id": { "$first": "$user_id" },
"count": { "$sum": 1 }
}
},
{
"$project": {
"user_id": "$user_id",
"date": "$datetime",
"symptom1": "$symptom1",
"symptom2": "$symptom2",
"symptom3": "$symptom3",
"count": "$count",
"_id": 1
}
}
],allowDiskUse=true)
Expected output
{u'user_id': u'de96dsdase303c6c6439891c57901183c0e4c',
u'symptom1': [{u'symptom': u'1479 ', u'date': u'20040910'}],
u'symptom2': [{u'symptom': u' ', u'date': u'20040910'}],
u'symptom3': [{u'symptom': u' ', u'date': u'20040910'}],
u'count': 1,
u'first_date': u'20040910'}