0

I have a collection with elements like this:

{
    "_id": {
        "$oid": "56d61033a378eccde8a8354f"
    },
    "id": "10021-2015-ENFO",
    "certificate_number": 9278806,
    "business_name": "ATLIXCO DELI GROCERY INC.",
    "date": "Feb 20 2015",
    "result": "No Violation Issued",
    "sector": "Cigarette Retail Dealer - 127",
    "address": {
        "city": "RIDGEWOOD",
        "zip": 11385,
        "street": "MENAHAN ST",
        "number": 1712
    }
}

What I want is to add _id for each address. This is one time operation and I am doing it for researching/testing purpose.

I think that I can split the task into the following steps:

  • get all unique addresses and create separate collection with them, assigning _id for each record (as I have read will add it if not specified)
  • make a join by all address fields in order to insert the corresponding _id for each embedded document in the source collection

I select the documents like like this:

db.ci.find({}, {"address":1, "_id":0});

but I fail to make a distinct from it using the Distinct function or foreach. I try to use aggregate, too but did not make it work.

Could anyone give me some tips?

I am using Ubuntu 20.04, mongodb 4.2.7, vs code with mongodb extension.

gotqn
  • 42,737
  • 46
  • 157
  • 243
  • If I understand correctly, you want to gather all of the unique addresses from all of the documents in the collection, add an identifying field to each address, and then attach that identifier to each corresponding address in the original collection. Does this identifier have to be an ObjectId, or is a unique integer sufficient? – Joe May 25 '20 at 17:59
  • Is [this](https://stackoverflow.com/a/61877771/6635464) helpful? – ngShravil.py May 25 '20 at 18:16
  • Well, I prefer to be _id (uniqueidentifer) but number is OK, too. – gotqn May 25 '20 at 20:18

1 Answers1

1

You can get a unique integer assigned to each in a single aggregation by using $facet to keep the original documents while using $addToSet to get the list of unique addresses, and $indexOfArray to assign the value to each document:

db.collection.aggregate([
  {$facet: {
      docs: [{$match: {}}],
      addresses: [
        {$group: {
            _id: null,
            address: {$addToSet: "$address"}
        }}
      ]
  }},
  {$unwind: "$docs"},
  {$unwind: "$addresses"},
  {$addFields: {
      "address.id": {
        $indexOfArray: [
          "$addresses.address",
          "$docs.address"
        ]
      }
  }},
  {$replaceRoot:{newRoot:"$docs"}},
  {$out:"new_collection"}
])

Playground

If you prefer an ObjectId, you have the right idea, one aggration to output the unique addresses to a temporary collection so each is auto-assigned an _id, then a second aggregation to embed those _id values in the original documents. For this example I gathered the _id of the original document to simplify the later lookup.

db.collection.aggregate([
  {$group:{
       _id:"$address",
       ids:{$push:"$_id"}
  }},
  {$project:{
       address:"$_id",
       ids:1,
       _id:0
  }},
  {$out: "temp_address_collection"}
])

Playground

db.collection.aggregate([
  {$lookup:{
      from:"temp_address_collection",
      localField:"_id",
      foreignField:"ids",
      as: "matched"
  }},
  {$addFields:{matched:{$arrayElemAt:["$matched",0]}}},
  {$addFields:{"$address.id": "$matched._id"}},
  {$project:{matched:0}},
  {$out:"new_collection"}
])

Playground

Joe
  • 25,000
  • 3
  • 22
  • 44
  • Thanks very much for the effort. I will invest some time to understand it as I am new to mongodb but your playgrounds are working perfectly. – gotqn May 26 '20 at 04:47