1

Hi guys please help on how to give a serial no based on another column with duplicates in python?

I have the below df

testdf = pd.DataFrame({"id" : ["ab12","bc12","cd12","ab12","bc13","cd12","cd12"], "serialno" : [1,1,None,2,1,None,None,], "date" : ["01-01-2019","02-01-2019","03-01-2019","04-01-2019","05-01-2019","07-01-2019","06-01-2019"]})
testdf

o/p

     id serialno    date
0   ab12    1   01-01-2019
1   bc12    1   02-01-2019
2   cd12    NaN 03-01-2019
3   ab12    2   04-01-2019
4   bc13    1   05-01-2019
5   cd12    NaN 07-01-2019
6   cd12    NaN 06-01-2019

I want to convert the ones which is Null/Empty and give it a unique serial number however it should be grouped based on the id column. Is it possible to assign serial number based on date sort(what i mean by that is when if you see the last row the serial number is 2 because 6th-Jan-2019 comes before 7th-jan-2019)

expected o/p

     id serialno    date
0   ab12    1   01-01-2019
1   bc12    1   02-01-2019
2   cd12    1   03-01-2019
3   ab12    2   04-01-2019
4   bc13    1   05-01-2019
5   cd12    3   07-01-2019
6   cd12    2   06-01-2019

please advise on how it can be done

Stramzik
  • 297
  • 3
  • 19

1 Answers1

1

Use GroupBy.rank with datetimes:

testdf['date'] = pd.to_datetime(testdf['date'])

testdf['serialno'] = testdf.groupby('id')['date'].rank('dense').astype(int)
print (testdf)
     id  serialno       date
0  ab12         1 2019-01-01
1  bc12         1 2019-02-01
2  cd12         1 2019-03-01
3  ab12         2 2019-04-01
4  bc13         1 2019-05-01
5  cd12         3 2019-07-01
6  cd12         2 2019-06-01
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252