1
{
      "_id" : ObjectId("5c9a31db04a2862329550467"),
    "colors" : 
    [
                {
                        "color" : "(255, 254, 255)",
                        "value" : 2,
                        "returnValue" : 2
                },
                ...,
        ],
        "artist" : "Tosa Mitsuoki",
        "date" : NaN,
        "genre" : "mythological painting",
        "style" : "Yamato-e",
        "title" : "Night March of a Hundred Demons (left half)",
        "fileName" : "colorsearch/img/29855.jpg"
}
db.art_data_full.aggregate(  { 
    "$match" : { 
        "colors.color" : "(0, 0, 0)" 
    } 
}, { 
    "$project": {
        "_id" : 1,
        "fileName": 1, 
        "artist" : 1, 
        "date": 1, 
        "genre": 1, 
        "style": 1, 
        "title" : 1, 
        "colors.returnValue" : 1
    } 
}, { 
    "$sort" : { 
        "colors.value" : -1
    } 
});

I am having an issue with returning the colors.value field in $project.

Currently using mongodb and pymongo with a django front end.

The "colors" field is very large in size due to the fact that it stores rgb values for every pixel in an image obtained by using PIL.

  • colors.color value is an rgb value string
  • colors.value value is the count of how many times it occurs in a picture
  • colors.returnValue value is a copy of value

I have tried different ways of structuring the data and am still obtaining the error:

"Sort exceeded memory limit of 104857600 bytes, but did not opt in to >external sorting. Aborting operation. Pass allowDiskUse:true to opt in."

Originally, there was just "color" and "value" and I would get the error and thought it was because colors.value is the sorting field and causing the issue.

My idea was to duplicate "value" into "returnValue" and use project to obtain returnValue and sort using value, but this still causes the same issue.

If I just leave colors.returnValue out the query performs great, but I would really like to return the count for display on the front end.

Currently, I am using one key, I have tried many combinations and this seems to work quite well.

[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "art-data.art_data_full"
        },
        {
                "v" : 2,
                "key" : {
                        "colors.color" : -1,
                        "colors.value" : -1
                },
                "name" : "colors.color_-1_colors.value_-1",
                "ns" : "art-data.art_data_full"
        }
]

I am not sure if allowDiskUse is going to be useful for resolving this issue.

Have tried it before in other iterations and it either still gives the error, crashes the mongodb server, or the query time is significantly long.

Could be using it in the wrong way, just in need of some assistance.

I feel like I am close but am missing something. I have searched multiple avenues for restructuring, etc but I need some help.

  • I think this blog will help you https://www.mkyong.com/mongodb/mongodb-sort-exceeded-memory-limit-of-104857600-bytes/ – Gerald Mar 28 '19 at 02:26
  • I have tried using allowDiskUse and the issue persists or the return time becomes unusable (too long). – Randy Jackson Mar 28 '19 at 02:41
  • can you try this https://mongoosejs.com/docs/api.html#aggregate_Aggregate-cursor. I think you have the same issue from https://github.com/Automattic/mongoose/issues/2964. – Gerald Mar 28 '19 at 03:03
  • Thanks for the reply Gerald, will look into it for pymongo. I didn't even consider that avenue yet, much appreciated. Ah, but this issue is occurring within mongo shell using an aggregate query so not sure. – Randy Jackson Mar 28 '19 at 05:19

0 Answers0