0

I am struggling to get my head around MongoDB and aggregates and groups. I've spent about 3 days so far.

I have source data that looks like...

{
   "formName" : "my form",
   "updatedAt" : "2021-11-02T13:29:00.123Z",
},
{
   "formName" : "another form",
   "lastUpdated" : "2021-10-01T13:29:00.123123",
},

Note that there are potentially different date names, though these are the only differences.

I am attempting to achieve an output of...

{
    "_id": null,
    "text": "my form",  (NOTE: This is the formName)
    "children": [{
       "text" : 2021, (This is the year part of the updated)
       "children" : [
          {"text" : 1}, (These are the month part of the updated)
          {"text" : 2},
          {"text" : 3},
          {"text" : 4}
       ]
    },
    ]
}

So, basically a tree, which has formName, with child branch of years, with child branch of months.

I have tried all kinds of things, many don't work, such as nested $addToSet inside $groups.

I have been close, but I can't solve it.

This is the closest, but this doesn't work.

db.FormsStore.aggregate( [
  
  {$match:{myKey:"a guid to group my forms together"}},
  {$project: {formName:1, lastUpdated:1, updatedAt:1}},
  { 
    $group: { 
      _id:   { formName: "$formName" }, 
      Year: {$addToSet: {$year: {$dateFromString: { dateString: "$lastUpdated" }}}},
      Month: {$addToSet: {$month: {$dateFromString: { dateString: "$lastUpdated" }}}},
    } 
  }, 
  { 
    $group: {
      _id: { formName: "$_id.formName" }, 
     Time: {$addToSet: {year: "$Year", month: "$Month"}}
    } 
  } 
]
)

The output from that is showing...

{ 
    _id: { formName: 'One of my forms' },
    Time: [
      {
        year: [ 2021 ],
        month: [ 10, 11 ] 
      }
    ]
 }

This will all be used in C#

Your help would be greatly appreciated.

David
  • 214
  • 3
  • 15

2 Answers2

1

Query

  • adds a new field "date" with the date on Date format
  • group first the more specific (formname+year) to put the months in the array
  • group then the less specific (formname) to put the years in the array
aggregate(
[{"$set": 
    {"date": 
      {"$cond": 
        ["$updatedAt", {"$dateFromString": {"dateString": "$updatedAt"}},
          {"$dateFromString": {"dateString": "$lastUpdated"}}]},
      "updatedAt": "$$REMOVE",
      "lastUpdated": "$$REMOVE"}},
  {"$group": 
    {"_id": {"text": "$formName", "year": {"$year": "$date"}},
      "children": {"$push": {"text": {"$month": "$date"}}}}},
  {"$group": 
    {"_id": "$_id.text",
      "children": 
      {"$push": {"text": "$_id.year", "children": "$children"}}}},
  {"$set": {"text": "$_id", "_id": "$$REMOVE"}}])

Edit

The bellow sorts also by year/month, and keeps only unique year/months per formName.

  • the difference is the group by formName,year,month to take the unique (first aacumulator will take one only of those that have the same in all 3)
  • replace-root (make that first document ROOT document)
  • and then sort by those 3 fields (descending year,ascending month)
  • group
  • sort by 2 fields
  • final group

PlayMongo
*mongoplaygroung loses the order of fields, run it on your driver also to be sure

aggregate(
[{"$set": 
    {"date": 
      {"$cond": 
        ["$updatedAt", {"$dateFromString": {"dateString": "$updatedAt"}},
          {"$dateFromString": {"dateString": "$lastUpdated"}}]},
      "updatedAt": "$$REMOVE",
      "lastUpdated": "$$REMOVE"}},
  {"$set": {"year": {"$year": "$date"}, "month": {"$month": "$date"}}},
  {"$group": 
    {"_id": {"formName": "$formName", "year": "$year", "month": "$month"},
      "doc": {"$first": "$$ROOT"}}},
  {"$replaceRoot": {"newRoot": "$doc"}},
  {"$sort": {"formName": 1, "year": -1, "month": 1}},
  {"$group": 
    {"_id": {"text": "$formName", "year": "$year"},
      "children": {"$push": {"text": "$month"}}}},
  {"$sort": {"_id.text": 1, "_id.year": -1}},
  {"$group": 
    {"_id": "$_id.text",
      "children": 
      {"$push": {"text": "$_id.year", "children": "$children"}}}},
  {"$set": {"text": "$_id", "_id": "$$REMOVE"}}])

With data

[
  {
    "formName": "my form",
    "updatedAt": "2021-11-02T23:30:15.123Z"
  },
  {
    "formName": "my form",
    "updatedAt": "2021-10-02T23:30:15.123Z"
  },
  {
    "formName": "my form",
    "updatedAt": "2020-06-02T23:30:15.123Z"
  },
  {
    "formName": "my form",
    "updatedAt": "2020-07-02T23:30:15.123Z"
  },
  {
    "formName": "another form",
    "updatedAt": "2021-10-01T23:30:15.123Z"
  },
  {
    "formName": "another form",
    "updatedAt": "2021-10-01T23:30:15.123Z"
  },
  {
    "formName": "another form",
    "updatedAt": "2021-09-01T23:30:15.123Z"
  },
  {
    "formName": "another form",
    "updatedAt": "2021-08-01T23:30:15.123Z"
  },
  {
    "formName": "another form",
    "updatedAt": "2020-10-01T23:30:15.123Z"
  }
]

I got results

[{
  "children": [
    {
      "text": 2021,
      "children": [
        {
          "text": 10
        },
        {
          "text": 11
        }
      ]
    },
    {
      "text": 2020,
      "children": [
        {
          "text": 6
        },
        {
          "text": 7
        }
      ]
    }
  ],
  "text": "my form"
},
{
  "children": [
    {
      "text": 2021,
      "children": [
        {
          "text": 8
        },
        {
          "text": 9
        },
        {
          "text": 10
        }
      ]
    },
    {
      "text": 2020,
      "children": [
        {
          "text": 10
        }
      ]
    }
  ],
  "text": "another form"
}]
Takis
  • 8,314
  • 2
  • 14
  • 25
  • While I had this open, I was still working on it myself... and have almost cracked it... by piping into new $group, but I am very interested in your solution. I have just tried to run yours locally and get an error, can't convert from BSON type string to Date. I think the problem is that the link to the test code, you have set up ISODate, but my data is String (hence my $dateFromString). The data in each set will have updatedAt OR lastUpdated, not both. – David Nov 02 '21 at 23:18
  • ok i updated the answer, but in general keep dates as Dates to be faster, you can use `$dateToString` – Takis Nov 03 '21 at 00:24
  • Thank you. This almost works, but is giving me duplicate months when I work with my real data. As to the dates, they are coming in from an external source and being imported directly into MongoDB. Now, my own is also not quite right yet, but I have put it as an answer as well, so you can see where I am... – David Nov 03 '21 at 08:33
  • I have just updated $push to $addToSet and that is better. (I am a MongoDB newbie, so reading up on these new directives and saw $push and $addToSet are similar, but $addToSet keeps unique values) I am now trying to sort by year/month - desc (to make the tree in order), but it doesn't appear to want to play. I have put {$sort: {"date":-1}} just after "lastUpdated": "$$REMOVE"}}. I may not have made this bit clear in my original post. – David Nov 03 '21 at 10:57
  • i updated the answer sorts and keeps only unique year/months – Takis Nov 03 '21 at 12:08
  • Thank you for your help so far... this is now soooo close. 2 observations. 1. The query is taking quite a long time compared to all the previous queries I have run, though if I put a $match and a $project in place, it will probably help (I have about 45,000 records). and 2. I have changed the order by for year and month to -1 to make descending, but, I am getting the year appears to be in a random order. For example... text:2020, children[array is descending], text 2019, [correct array], text 2017, [array 1 item], text 2021, [array desc], text 2018, [array desc]. Thank you for continuing help. – David Nov 03 '21 at 13:43
  • i updated the answer to short year descending and month ascending, but query does alot, also the only index that can be used i think its on `formName` (dates are strings, and you also want year and month so it cant work). Maybe there is a faster way to archieve the result you want(faster query), someone else might help or if i think a better way i might re-try. But i think it does what you want now. – Takis Nov 03 '21 at 16:06
  • Perfect. (The index didn't do much though, but I have limited to a set of forms with a match, and also the projection, which does make a big difference). – David Nov 03 '21 at 16:56
0

I continued to work on it, and while this is also not quite right (yet), here is what I came up with.

db.FormsStore.aggregate([
  
  {$project: {formName:1, lastUpdated:1, updatedAt:1}},
  { 
    $group: { 
      _id: {formName: "$formName", Year: {$year: {$dateFromString: { dateString: "$updatedAt" }}}, Month: {$month: {$dateFromString: { dateString: "$updatedAt" }}}},
    } 
  },
  {
    $group: {
        _id: {formName: "$_id.formName", Year: "$_id.Year"}, 
        Months: {$addToSet: {Text: "$_id.Month"}}
    }
  },
  {
    $group: {
        _id:  "$_id.formName", Children: {$addToSet: {Text: "$_id.Year", Children: "$Months"}}, 
        
    }
  }
])

Getting all my data in the first group, then creating a set with the months in the second group, then creating a set with the years and adding the months to each year in the third group.

David
  • 214
  • 3
  • 15