0

Hi I have a data frame like so:

In[1]: import pandas as pd
in[2]: df = pd.DataFrame([['X', "1/31/2021", "8:00"], ['Y',"1/31/2021", "8:00"], ['X', "1/31/2021", "10:00"], ['Y',"2/1/2021", "8:00"]], \
 columns=['name', 'Date', 'Time'])
In[3]: print(df)

   Out[3]:  name   Date     Time
        0    X  1/31/2021   8:00
        1    Y  1/31/2021   8:00
        2    X  1/31/2021  10:00
        3    Y   2/1/2021   8:00

I would like to assign a value to a new column based on when columns 'name' and 'date' are equal but factors in how many times they have already equaled. So if 'name' and 'date' have equaled each other twice and their 'time' is different, the occurrence earlier in that date will be assigned 1 and the occurrence later will be assigned 2.

In[4]: print(df)
Out[4]:name    Date     Time  Number
    0    X  1/31/2021   8:00       1
    1    Y  1/31/2021   8:00       1
    2    X  1/31/2021  10:00       2
    3    Y   2/1/2021   8:00       1

I think I should us np.where but don't know how to generate the correct expression that captures these occurrences.

TheFrator
  • 15
  • 3

1 Answers1

0

You can groupby on both name and Date and use cumcount:

>>> df['Number'] = df.groupby(['name', 'Date']).cumcount().add(1)

    name    Date        Time    Number
0   X       1/31/2021   8:00    1
1   Y       1/31/2021   8:00    1
2   X       1/31/2021   10:00   2
3   Y       2/1/2021    8:00    1
tdy
  • 36,675
  • 19
  • 86
  • 83