0

I have a following collection in MongoDB:

{
    "_id" : ObjectId("5bbc86e5c16a27f1e1bd39f8"),
    "name" : "swetha",
    "nameId" : 123,
    "source" : "Blore",
    "sourceId" : 10,
    "LastUpdate" : "10-Oct-2018"
}
{
    "_id" : ObjectId("5bbc86e5c16a27f1e1bd39f9"),
    "name" : "swetha",
    "nameId" : 123,
    "source" : "Mlore",
    "sourceId" : "11",
    "LastUpdate" : "11-Oct-2018"
}
{
    "_id" : ObjectId("5bbc86e5c16a27f1e1bd39fa"),
    "name" : "swathi",
    "nameId" : 124,
    "source" : "Mlore",
    "sourceId" : "11",
    "LastUpdate" : "9-Oct-2018"
}

I am a beginner to Python and want to compare the 'LastUpdate' between the above records based on matching 'name' or 'nameId' and want to push the record with latest date to another collection. E.g. name:'Swetha' is same in first two records. So compare 'LastUpdate' between them and output the record with latest date.

I have written following code to read data records from MongoDB and to print. I didn't understand how to compare records within a same key and compare their timestamp though I referred few resources on Google.

import json
import pandas as pd
from pymongo import MongoClient

try: 
    client = MongoClient() 
    print("Connected successfully!!!") 
except:   
    print("Could not connect to MongoDB") 

# database 
db = client.conn
collection = db.contactReg
df = collection.find()
for row in df:
    print(row)

Links that are ref

Is there a better way to compare dictionary values

https://gis.stackexchange.com/questions/87276/how-to-compare-values-from-a-column-in-attribute-table-with-values-in-dictionary

Comparing two dictionaries and printing key value pair in python and few more.

Zoe
  • 27,060
  • 21
  • 118
  • 148
swetha reddy
  • 201
  • 5
  • 19

1 Answers1

1

I think what you need is an aggregation. This might look big but once you get the hang out of mongo aggregations you'll get comfortable.

df = collection.aggregate([
    {
        "$project": {
            "_id": 0,
            "name": 1,
            "nameId": 1,
            "source": 1,
            "sourceId": 1,
            "LastUpdate": 1,
            "LastUpdateArray": {
                "$split": [
                    "$LastUpdate",
                    "-"
                ]
            }
        }
    },
    {
        "$project": {
            "name": 1,
            "nameId": 1,
            "source": 1,
            "sourceId": 1,
            "LastUpdate": 1,
            "LastUpdateArray": 1,
            "LastUpdateMonth": {
                "$arrayElemAt": [
                    "$LastUpdateArray",
                    1
                ]
            }
        }
    },
    {
        "$project": {
            "name": 1,
            "nameId": 1,
            "source": 1,
            "sourceId": 1,
            "LastUpdate": 1,
            "Year": {
                "$arrayElemAt": [
                    "$LastUpdateArray",
                    2
                ]
            },
            "Date": {
                "$arrayElemAt": [
                    "$LastUpdateArray",
                    0
                ]
            },
            "Month": {
                "$switch": {
                    "branches": [
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Jan"
                                ]
                            },
                            "then": "01"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Feb"
                                ]
                            },
                            "then": "02"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Mar"
                                ]
                            },
                            "then": "03"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Apr"
                                ]
                            },
                            "then": "04"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "May"
                                ]
                            },
                            "then": "05"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Jun"
                                ]
                            },
                            "then": "06"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Jul"
                                ]
                            },
                            "then": "07"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Aug"
                                ]
                            },
                            "then": "08"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Sep"
                                ]
                            },
                            "then": "09"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Oct"
                                ]
                            },
                            "then": "10"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Nov"
                                ]
                            },
                            "then": "11"
                        },
                        {
                            "case": {
                                "$eq": [
                                    "$LastUpdateMonth",
                                    "Dec"
                                ]
                            },
                            "then": "12"
                        }
                    ],
                    "default": "01"
                }
            }
        }
    },
    {
        "$project": {
            "name": 1,
            "nameId": 1,
            "source": 1,
            "sourceId": 1,
            "LastUpdate": 1,
            "Year": 1,
            "Date": 1,
            "Month": 1,
            "DateString": {
                "$concat": [
                    "$Year",
                    "-",
                    "$Month",
                    "-",
                    "$Date"
                ]
            }
        }
    },
    {
        "$project": {
            "name": 1,
            "nameId": 1,
            "source": 1,
            "sourceId": 1,
            "LastUpdate": 1,
            "Date": {
                "$dateFromString": {
                    "dateString": "$DateString"
                }
            }
        }
    },
    {
        "$sort": {
            "Date": -1
        }
    },
    {
        "$group": {
            "_id": "$name",
            "name": {
                "$first": "$name"
            },
            "nameId": {
                "$first": "$nameId"
            },
            "source": {
                "$first": "$source"
            },
            "sourceId": {
                "$first": "$sourceId"
            },
            "LastUpdate": {
                "$first": "$LastUpdate"
            },
            "Date": {
                "$first": "$Date"
            }
        }
    },
    {
        "$project": {
            "name": 1,
            "nameId": 1,
            "source": 1,
            "sourceId": 1,
            "LastUpdate": 1
        }
    }
])

The first 5 steps of aggregation, I tried to convert it into a date and then sort descending by date. In group by I grouped with name and took the first that comes with that name.

Hope this helps.

I'm assuming what you need is duplicate records and I'm taking the first one that comes. Reference : https://stackoverflow.com/a/26985011/7630071

df = collection.aggregate([
    {
        "$group": {
            "_id": "$name",
            "count": {
                "$sum": 1
            },
            "data": {
                "$push": {
                    "nameId": "$nameId",
                    "source": "$source",
                    "sourceId": "$sourceId",
                    "LastUpdate": "$LastUpdate"
                }
            }
        }
    },
    {
        "$match": {
            "_id": {
                "$ne": null
            },
            "count": {
                "$gt": 1
            }
        }
    }
])
Sivesh Gsk
  • 26
  • 5