Filter an aggregate by string length

Question

I have performed the following query:

db.Indiv2.aggregate(
{$unwind: '$conso'},
{$group: {_id: {conso:'$conso.nom_commercial', region:"$region"}, sum: 
{$sum: 1}}},
{$sort : {sum : -1}},
{$group: {
_id: "$_id.region",
"conso": {
    $first: "$_id.conso"
},
"sum": {
    $first: "$sum"
},
}},
{$sort : {_id : 1}}
);

which returns the highest consumed food by region in the following format:

    {
        "_id" : {
            "conso" : "x",
            "region" : 1
        },
        "sum" : 73226.0
    },
    {
        "_id" : {
            "conso" : "x",
            "region" : 8
        },
        "sum" : 25683.0
    },
    {
        "_id" : {
            "conso" : "grandlait demi �cr�m� uht",
            "region" : 1
        },
        "sum" : 251.0
    }

However, a lot of foods do not have names. Such items are named "x" (example above). I would like to filter my query to exclude such items. I am looking for an equivalent of a filter which can either filter strings which length are below 2 characters or filter the string "x".

Can you use the [edit] link to show some sample documents from `Indiv2` collection and your expected output? — chridam, Nov 13 '17 at 17:47

score 2 · Accepted Answer · answered Nov 13 '17 at 17:50

There is a $redact operator which might be helpful here:

Try:

db.collection.aggregate([
    //previous aggregations
    {
        "$redact": {
            "$cond": [
                { "$gt": [ { "$strLenCP": "$_id.conso" }, 2] },
                "$$KEEP",
                "$$PRUNE"
            ]
        }
    }
])

score 0 · Answer 2 · answered Nov 13 '17 at 18:46

You could just capture the conso length and filter. Given the or condition described above, we end up with this:

// previous pipeline here, then:
{$addFields: {slen: {$strLenBytes: "$_id.conso"}}}
,{$match: {$or: [ {"_id.conso": {$ne: "x"}}, {slen: {$gt: 1}} ] }}

Filter an aggregate by string length

2 Answers2