1

I am trying to group data using mongodb aggregation method. I have no idea about the DB structure but email column will be there. In DB email column can be saved with any name like gmail, mail, etc. For example test DB is shown below. How to group by email column if we don't know the name present in DB

    {
      "name": "stuv",
      "email": "xyz@gmail.in",
      "phone": "12345678990" 
    },
    {
     
      "name": "pqr",
      "email": "abc@gmail.com",
      "phone": "9876543210"
     
    },
    {
      "name": "xyz",
      "email": "xyz@gmail.in",
      "phone": "2345678901"
    
    },
    {
      "name": "abc",
      "email": "abc@gmail.com",
      "phone": "4567890123"
      
    }

From the above DB, I want to group data by unknown column name email so I am unwinding data and matching value with email regex and grouping the data and pushing root data to dups as shown below.


  db.test.aggregate([
    
  {$project: {_id: 0}},
 
  {$project: {data: { $objectToArray: "$$ROOT"}}},

 { $unwind: "$data"},

  {$group: {_id: "$data.v", 
  
      dups:{ $push: "$data"},
      count: { "$sum": 1 }   

    }},
    
    { $match: { "_id": /^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/ }  },

])

From the above aggregate function after using unwind I am not getting all fields(name, phone, email) inside dups instead I am getting only grouped field i.e, email column in the form of key and value separately as shown in the below output. I want output like email:abc@gmail.com along with all other fields as shown in expected output result.

MY OUTPUT:

 "result": [
        {
            "_id": "abc@gmail.com",
            "dups": [
                {
                    "k": "email",
                    "v": "abc@gmail.com"
                },
                {
                    "k": "email",
                    "v": "abc@gmail.com"
                }
            ],
            "count": 2
        },
        {
            "_id": "xyz@gmail.in",
            "dups": [
                {
                    "k": "email",
                    "v": "xyz@gmail.in"
                },
                {
                    "k": "email",
                    "v": "xyz@gmail.in"
                }
            ],
            "count": 2
        }
    ]

EXPECTED OUTPUT

"result": [
        {
            "_id": {
                "email": "abc@gmail.com"
            },
            "dups": [
                {
                    "_id": "62af2c14e50225b659ab68a4",
                    "name": "pqr",
                    "email": "abc@gmail.com",
                    "phone": "9876543210"
                },
                {
                    "_id": "62af2c14e50225b659ab68a6",
                    "name": "abc",
                    "email": "abc@gmail.com",
                    "phone": "4567890123"
                }
            ],
            "count": 2
        },
        {
            "_id": {
                "email": "xyz@gmail.in"
            },
            "dups": [
                {
                    "_id": "62af2c14e50225b659ab68a3",
                    "name": "stuv",
                    "email": "xyz@gmail.in",
                    "phone": "12345678990"
                },
                {
                    "_id": "62af2c14e50225b659ab68a5",
                    "name": "xyz",
                    "email": "xyz@gmail.in",
                    "phone": "2345678901"
                }
            ],
            "count": 2
        }
    ]

Kindly help me in resolving this issue.

anu
  • 21
  • 4

1 Answers1

0

EDIT: One option is to keep the current data and to it the array of keys and values, here as mailData. Then we can extract the email value using $regexMatch and $reduce to use it at the $group step as the _id

db.collection.aggregate([
  {$project: {mailData: {$objectToArray: "$$ROOT"}, data: "$$ROOT"}},
  {
    $set: {
      mailData: {
        $first: {
          $reduce: {
            input: "$mailData",
            initialValue: [],
            in: {
              $concatArrays: [
                "$$value",
                {
                  $cond: [
                    {
                      $regexMatch: {
                        input: {$toString: "$$this.v"},
                        regex: /^\w+(?:[.-]\w+)*@\w+(?:[.-]\w+)*\.\w{2,3}$/
                      }
                    },
                    ["$$this.v"],
                    []
                  ]
                }
              ]
            }
          }
        }
      }
    }
  },
  {
    $group: {
      _id: "$mailData",
      dups: {$push: "$data"},
      count: {"$sum": 1}
    }
  }
])

See how it works on the playground example

Another option is similar to your attempt, but using $regexMatch and group afterwards:

db.collection.aggregate([
  {$project: {mailData: {$objectToArray: "$$ROOT"}, data: "$$ROOT"}},
  {$unwind: "$mailData"},
  {
    $match: {
      $expr: {
        $eq: [
          {$regexMatch: {
              input: {$toString: "$mailData.v"},
              regex: /^\w+(?:[.-]\w+)*@\w+(?:[.-]\w+)*\.\w{2,3}$/
            }
          }, true
        ]
      }
    }
  },
  {
    $group: {
      _id: "$mailData.v",
      dups: {$push: "$data"},
      count: {"$sum": 1}
    }
  }
])

See how it works on the playground example - unwind

nimrod serok
  • 14,151
  • 2
  • 11
  • 33
  • \\ I don't know the column name which is stored in DB. The email column can be saved with any name like mail, gmailcompany, etc. So to get value from root I am using $objectToArray and $unwind and then matching with email regex. – anu Jun 20 '22 at 07:08
  • Updated accordingly – nimrod serok Jun 20 '22 at 07:42
  • Thankyou. But, I dont know the DB structure, any fields can be present in DB not only the fields I mentioned in DB structure like name, phone, email etc. Without knowing field names I can't push with into dups. – anu Jun 20 '22 at 07:59
  • Try it now. Updated again – nimrod serok Jun 20 '22 at 08:06
  • Thankyou so much. It is working as expected. – anu Jun 20 '22 at 08:24