First ensure that your date column is of datetime type, then perform a diff
to get a timedelta and apply a cumsum
on the deltas that are above 1 day. To get letters, you can map
the obtained number to chr
, but keep in mind that you'll have a limited number of groups (or you need to write a slightly more complex mapping to generate AA after Z: example here)
I provided both a numerical and letter grouping.
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df['group'] = df['date'].diff().gt(pd.to_timedelta('1d')).cumsum()
df['group2'] = 'Group '+df['group'].map(lambda x: chr(65+x))
Output:
date group group2
0 2020-01-01 0 Group A
1 2020-01-02 0 Group A
2 2020-01-03 0 Group A
3 2020-01-03 0 Group A
4 2020-01-05 1 Group B
5 2020-01-10 2 Group C
6 2020-01-13 3 Group D
7 2020-01-14 3 Group D
8 2020-01-20 4 Group E
same process per group
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df['group'] = (df.groupby('customer')['date'].diff()
.gt(pd.to_timedelta('1d'))
.groupby(df['customer']).cumsum()
)
df['group2'] = 'Group '+df['group'].map(lambda x: chr(65+x))
output:
customer date group group2
0 Customer A1 2020-01-01 0 Group A
1 Customer A1 2020-01-01 0 Group A
2 Customer A1 2020-01-02 0 Group A
3 Customer A1 2020-01-03 0 Group A
4 Customer A1 2020-01-03 0 Group A
5 Customer A1 2020-01-05 1 Group B
6 Customer A1 2020-01-10 2 Group C
7 Customer A1 2020-01-13 3 Group D
8 Customer A1 2020-01-14 3 Group D
9 Customer A1 2020-01-20 4 Group E
10 Customer B1 2020-01-21 0 Group A
11 Customer B1 2020-01-22 0 Group A
12 Customer B1 2020-01-24 1 Group B
13 Customer B1 2020-01-27 2 Group C
14 Customer B1 2020-01-28 2 Group C