Here is the example data set
id firstname lastname email update date
A1 wendy smith ws@mail.com 2018-01-02
A1 wendy smith smith@mail.com 2019-02-03
A2 harry lynn lynn@mail.com 2016-04-03
A2 harry harylynn@mail.com 2019-03-12
A3 tinna dickey tinna@mail.com 2016-04-03
A3 tinna dickey tinna@mail.com 2013-06-12
A4 Tom Lee Tom@mail.com 2012-06-12
A5 Ella Ella@mail.com 2019-07-12
A6 Ben Lang Ben@mail.com 2019-03-12
I have sorted the data set by id
and update date
, I want to merge the rows with same id
, if one row with empty value, fill the other one with same id
, if confilct, use the latest one. For rows with no duplicate id
leave the empty cell as it is.
the output should be:
id firstname lastname email update date
A1 wendy smith smith@mail.com 2019-02-03
A2 harry lynn harylynn@mail.com 2019-03-12
A3 tinna dickey tinna@mail.com 2019-03-12
A4 Tom Lee Tom@mail.com 2012-06-12
A5 Ella Ella@mail.com 2019-07-12
A6 Ben Lang Ben@mail.com 2019-03-12
my attempt was using ffill()
to merge rows with empty and keep last duplicate, but the result seems to affect other cells which should have empty values(like lastname in A5 should be empty ).
df=df.ffill().drop_duplicates('id',keep='last')