I have a data frame (dfCust
) like so:
|cust_key|first_name|last_name|address |
-----------------------------------------------
|12345 |John |Doe |123 Some street|
|12345 |John |Doe |123 Some st |
|67890 |Jane |Doe |456 Some street|
and I would like to basically remove duplicate records such that the cust_key
field is unique. I do not care about the record that is dropped, at the point that this happens, the addresses have already been deduplicated so the only ones that trickle through are spelling errors. I would like the following resulting dataframe:
|cust_key|first_name|last_name|address |
-----------------------------------------------
|12345 |John |Doe |123 Some street|
|67890 |Jane |Doe |456 Some street|
in R this would basically be done like this:
dfCust <- unique(setDT(dfCust), by = "cust_key")
but I need a way to do this in pandas.