1

I have a data frame like this:

   Clinic Number        date
0              1  2015-05-05
1              1  2015-05-05
2              1  2016-01-01
3              2  2015-05-05
4              2  2016-05-05
5              3  2017-05-05
6              3  2017-05-05

I want to create a new column and fill it out based on some conditions. so the new data frame should be like this:

   Clinic Number    date      row_number
0              1  2015-05-05    1
1              1  2015-05-05    1
2              1  2016-01-01    2
3              2  2015-05-05    3
4              2  2016-05-05    4
5              3  2017-05-05    5
6              3  2017-05-05    5

what is the rule for putting entries inside new column: where Clinic Number and date is the same they will get same numbers, if it changes it will increases.

For example here 1 2015-05-05 has two rows which have same Clinic Number and date so they all get 1. the next row have Clinic Number=1 but the date is not the same as previous rows so it will get 2. where Clinic Number=2 there is no row with Clinic Number=2 and the same date so it got 3 and the next row is 4...

till now I have tried something like this:

def createnumber(x):
    x['row_number'] = i

d['row_number']= pd1.groupby(['Clinic Number','date']).apply(createnumber)

but I do not know how to implement this function.

I appreciate if you can help me with this:) Also I have looked at links like this but they are not dynamic (i mean here the row number should be increased based on some conditions)

sariii
  • 2,020
  • 6
  • 29
  • 57

1 Answers1

3

Instead of a groupby, you could just do something like this, naming your conditions seperately. So if the date shifts OR the clinic number changes, you return True, and then get the cumsum of those True values:

df['row_number'] = (df.date.ne(df.date.shift()) | df['Clinic Number'].ne(df['Clinic Number'].shift())).cumsum()


>>> df
   Clinic Number        date  row_number
0              1  2015-05-05           1
1              1  2015-05-05           1
2              1  2016-01-01           2
3              2  2015-05-05           3
4              2  2016-05-05           4
5              3  2017-05-05           5

You'll need to make sure your dataframe is sorted by Clinic Number and Date first (you could do df.sort_values(['Clinic Number', 'date'], inplace=True) if it's not sorted already)

sacuL
  • 49,704
  • 8
  • 81
  • 106
  • do you have any idea of this question ? https://stackoverflow.com/questions/51458599/how-to-append-files-efficiently-based-on-some-conditions Actually the way Im going to do is not efficient. Id appreciate it if you have time only have a quick look – sariii Jul 22 '18 at 18:11