0

I am having some trouble for the first time aggregating results with MongoDB. I am trying to aggregate some sales totals, but it returns duplicated, even triplicated results, with the same resulting data. Can someone figure out what I am doing wrong?

This is a sample data row:

{
        "owner": "Anitta",
        "owner_id": ObjectId("5e0e25fc3ed15e27c5f5bb2b"),
        "units": NumberInt(19802),
        "percent": 0.35,
        "value": 30.35063004976345,
        "artist_name": "Warner",
        "release_title": "Carol",
        "upc": "3615937741130",
        "track_title": "Carol",
        "isrc": "BR9Z21900056",
        "store_name": "Spotify",
        "country": "Brazil",
        "label_name": "Warner Music",
        "distributor": "Believe",
        "sale_type": "audio",
        "sales_id": ObjectId("5e5e91ac238279bd2ba7e839"),
        "created_at": ISODate("2020-03-11T20:18:11.076-03:00"),
        "report_name": "202002-believe.csv",
        "sales_date": ISODate("2019-11-01T21:00:00.000-03:00"),
        "distributor_date": ISODate("2020-02-01T00:00:00.000-02:00"),
        "distributor_value": 5.355993538193549,
    }

This is the aggregation query:

filters = { "$match": { owner: { $nin: [false, '', null] } } };
query.push(filters);

query.push({
                    $group: {
                        _id: {"owner": "$owner", "owner_id": "$owner_id"},
                        total_units: { $sum: "$units" },
                        total: { $sum: "$value" },
                    }
                })
                query.push({ $sort: { "total": -1 } })

And this is the result i am getting:

{
    data: [
        {
            _id: {
                owner: "Anitta",
                owner_id: "5e0e25fc3ed15e27c5f5bb2b"
            },
            total_units: 8127500,
            total: 10163.241069212721
        },
      
        {
            _id: {
                owner: "Anitta",
                owner_id: "5e0e25fc3ed15e27c5f5bb2b"
            },
            total_units: 5851785,
            total: 2952.923583543785
        },
    ]
}

1 Answers1

0

I have found the problem and the solution!

The problem is that I was saving the owner_id as a string and as an ObjectId. That was generating the splitted results.

For those who are passing the same problem, I recommend modifying the saved reference from string to ObjectId as it uses less space in your database. You can see this thread to understand better: MongoDb: Benefit of using ObjectID vs a string containing an Id?

So, if you are having this problem, you can do something like this:

db.distribution.aggregate(
    // FILTER TO GET ONLY THE IDs SAVED AS STRING
    {$match: {owner_id : {$type : 2}}},
    
    // GET THE NECESSARY DATA
    {$group: {_id: {
        owner_id: "$owner_id",
        owner: "$owner",
    }}}
    )
.limit(1000)
.forEach(function(d) {
    // TRANSFORM THE STRING INTO OBJECTID
    var id = ObjectId(d._id.owner_id); 
    
    // THE STRING ID
    var oldId = d._id.owner_id; 
    
    // DISPLAY SOMETHING TO MONITOR
    console.log(d._id.owner, id, oldId)
    
    // UPDATE THE ENTRIES
    var ret = db.distribution.updateMany({
        owner_id: oldId,
    },
    {
        $set: {
            owner_id: id
        }
    });
    
    // JUST TO MONITOR THE RESULTS
    console.log(ret)
})