{
"_id" : ObjectId("5c9a31db04a2862329550467"),
"colors" :
[
{
"color" : "(255, 254, 255)",
"value" : 2,
"returnValue" : 2
},
...,
],
"artist" : "Tosa Mitsuoki",
"date" : NaN,
"genre" : "mythological painting",
"style" : "Yamato-e",
"title" : "Night March of a Hundred Demons (left half)",
"fileName" : "colorsearch/img/29855.jpg"
}
db.art_data_full.aggregate( {
"$match" : {
"colors.color" : "(0, 0, 0)"
}
}, {
"$project": {
"_id" : 1,
"fileName": 1,
"artist" : 1,
"date": 1,
"genre": 1,
"style": 1,
"title" : 1,
"colors.returnValue" : 1
}
}, {
"$sort" : {
"colors.value" : -1
}
});
I am having an issue with returning the colors.value
field in $project
.
Currently using mongodb and pymongo with a django front end.
The "colors" field is very large in size due to the fact that it stores rgb values for every pixel in an image obtained by using PIL.
colors.color
value is an rgb value stringcolors.value
value is the count of how many times it occurs in a picturecolors.returnValue
value is a copy of value
I have tried different ways of structuring the data and am still obtaining the error:
"Sort exceeded memory limit of 104857600 bytes, but did not opt in to >external sorting. Aborting operation. Pass allowDiskUse:true to opt in."
Originally, there was just "color" and "value" and I would get the error and thought it was because colors.value is the sorting field and causing the issue.
My idea was to duplicate "value" into "returnValue" and use project to obtain returnValue and sort using value, but this still causes the same issue.
If I just leave colors.returnValue out the query performs great, but I would really like to return the count for display on the front end.
Currently, I am using one key, I have tried many combinations and this seems to work quite well.
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "art-data.art_data_full"
},
{
"v" : 2,
"key" : {
"colors.color" : -1,
"colors.value" : -1
},
"name" : "colors.color_-1_colors.value_-1",
"ns" : "art-data.art_data_full"
}
]
I am not sure if allowDiskUse is going to be useful for resolving this issue.
Have tried it before in other iterations and it either still gives the error, crashes the mongodb server, or the query time is significantly long.
Could be using it in the wrong way, just in need of some assistance.
I feel like I am close but am missing something. I have searched multiple avenues for restructuring, etc but I need some help.