0

How can I merge the documents together where the user is the same?

What I find difficult to do is to automatically add the values from the "field" column as columns and the data from the "data" field as values for the newly created columns.

Like merging these two because they have the same user id and have the columns "Date of birth":"1989-01-12" and "Job":"Teacher".

I know it's a lot to ask, but could someone guide me to how to achieve this?

{
    "_id" : ObjectId("5d6b00b960016c4c441d9a16"),
    "user" : 1000,
    "field" : "Date of birth",
    "data" : "1989-01-12",
    "timestamp" : "2017-08-27 11:00:59"
}

{
    "_id" : ObjectId("5d6b00b960016c4c441d9a17"),
    "user" : 1000,
    "field" : "Job",
    "data" : "Teacher",
    "timestamp" : "2017-08-27 10:59:19"
}

Into

{
    "_id" : ObjectId("5d6b00b960016c4c441d9a16"),
    "user" : 1000,
    "Date of birth" : "1989-01-12",
    "Job" : "Teacher",
    "timestamp" : "2017-08-27 11:00:59"
}
  • 1
    I strongly suggest you do not do that as it would seem you might be picking up a project where the original author had the better idea to what you seem to think is good. You really want **separate documents** here and **DO NOT USE NAMED FIELDS**. Best advice is to take this as a learning experience and leave the data as it is. It's perfectly fine. Better to spend your time understanding why the original author did it that way. – Neil Lunn Sep 01 '19 at 00:16
  • 1
    I would also strongly impress that the original authors intent here is clearly **transactional** given that each entry has it's own timestamp. Your proposal to change that would ruin the transactional history. I also suggest a deeper dive into to dataset as you will likely find that many things do not have commonly named fields. Hence why it has been done with consistent names such as `field` and `data` instead. – Neil Lunn Sep 01 '19 at 00:22
  • We can do that easily, but before that, I am curious to know why the data is inserted that way? are there multiple sources from where the data is coming? and what's the significance of the timestamp? – Himanshu Sharma Sep 01 '19 at 06:09

1 Answers1

1

To merge documents, you can iterate them to create a new document. If you want to remove the old references to the given user, you have to delete them before inserting your new document.

You can go like this using the javascript interface of MongoDB:

// Get the docs
docs = db.find({user:1000})
// Init common values of the new doc
new_doc = db.findOne({user:1000}) 
// Remove the unused field in the new object
delete new_doc.field
delete new_doc.data 
for (i=0; i<docs.size(); i++){
    doc = docs[i]
    new_doc[doc["field"]] = new_doc["data"]
    if (Date(doc["timestamp"]) > Date(new_doc["timestamp"])) {
        new_doc["timestamp"] = doc["timestamp"]
    }
}
// remove all references to the user if you need to
db.remove({user:1000}, {multi:true})
// insert the merged document
db.insert(new_doc)

dtrckd
  • 657
  • 7
  • 17