0

Picture HERE

I have a data like I show in the attached picture. What I want is if the next value is same or equal to the current value + 1, then I will put them in the same group. The group need to be different between customers.

for example:

(dd-mm-yyyy)

Customer A1 1/1/2020 - Group A
Customer A1 1/1/2020 - Group A
Customer A1 2/1/2020 - Group A
Customer A1 3/1/2020 - Group A
Customer A1 3/1/2020 - Group A
Customer A1 5/1/2020 - Group B
Customer A1 10/1/2020 - Group C
Customer A1 13/1/2020 - Group D
Customer A1 14/1/2020 - Group D
Customer A1 20/1/2020 - Group E

Customer B1 21/1/2020 - Group A
Customer B1 22/1/2020 - Group A
Customer B1 24/1/2020 - Group B
Customer B1 27/1/2020 - Group C
Customer B1 28/1/2020 - Group C

How can I resolve it in pandas?

duyvan
  • 11
  • 2

2 Answers2

-1

First ensure that your date column is of datetime type, then perform a diff to get a timedelta and apply a cumsum on the deltas that are above 1 day. To get letters, you can map the obtained number to chr, but keep in mind that you'll have a limited number of groups (or you need to write a slightly more complex mapping to generate AA after Z: example here)

I provided both a numerical and letter grouping.

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df['group'] = df['date'].diff().gt(pd.to_timedelta('1d')).cumsum()
df['group2'] = 'Group '+df['group'].map(lambda x: chr(65+x))

Output:

        date  group   group2
0 2020-01-01      0  Group A
1 2020-01-02      0  Group A
2 2020-01-03      0  Group A
3 2020-01-03      0  Group A
4 2020-01-05      1  Group B
5 2020-01-10      2  Group C
6 2020-01-13      3  Group D
7 2020-01-14      3  Group D
8 2020-01-20      4  Group E

same process per group

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df['group'] = (df.groupby('customer')['date'].diff()
                 .gt(pd.to_timedelta('1d'))
                 .groupby(df['customer']).cumsum()
              )
df['group2'] = 'Group '+df['group'].map(lambda x: chr(65+x))

output:

       customer       date  group   group2
0   Customer A1 2020-01-01      0  Group A
1   Customer A1 2020-01-01      0  Group A
2   Customer A1 2020-01-02      0  Group A
3   Customer A1 2020-01-03      0  Group A
4   Customer A1 2020-01-03      0  Group A
5   Customer A1 2020-01-05      1  Group B
6   Customer A1 2020-01-10      2  Group C
7   Customer A1 2020-01-13      3  Group D
8   Customer A1 2020-01-14      3  Group D
9   Customer A1 2020-01-20      4  Group E
10  Customer B1 2020-01-21      0  Group A
11  Customer B1 2020-01-22      0  Group A
12  Customer B1 2020-01-24      1  Group B
13  Customer B1 2020-01-27      2  Group C
14  Customer B1 2020-01-28      2  Group C
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Hi mozway, thanks for the replay, I have added 1 line between 1/1/2020 and 2/1/2020, does the formulas still work? – duyvan Nov 26 '21 at 05:48
  • @duyvan yes it does, you could have tried it yourself ;) – mozway Nov 26 '21 at 05:55
  • actually I need to restart the group when there is new customer, then the code will not work :(. Can you give me a hand pls – duyvan Nov 26 '21 at 06:45
  • @duyvan it's easy, just do the same thing with a `groupby` on customer (once for `diff`, once for `cumsum`) – mozway Nov 26 '21 at 08:07
-1

Here is one of the approach:

import pandas as pd
from datetime import datetime
date_format = "%d/%m/%Y"

df = pd.DataFrame({
    'Date': ['01/01/2021', '01/01/2021', '02/01/2021', '04/01/2021', '05/01/2021'],
})

df['Group'] = ''
start_date = df['Date'][0]
group = ord('A')
for i in range(len(df)):
    date = df['Date'].values[i]
    current_date = datetime.strptime(date , date_format)
    previous_date = datetime.strptime(start_date, date_format)
    if (current_date - previous_date).days > 1:
        group += 1
    df['Group'].values[i] = chr(group)
    start_date = date
    
print (df)

Output:

Date Group
0  01/01/2021     A
1  01/01/2021     A
2  02/01/2021     A
3  04/01/2021     B
4  05/01/2021     B
Bhagyesh Dudhediya
  • 1,800
  • 1
  • 13
  • 16
  • Hi @Bhagyesh Dudhediya, thanks for the answer, It works fine as well, but I think mozway answer is cleaner :D. I just have the customer field, can you please have a look? – duyvan Nov 26 '21 at 07:51
  • Oh, earlier there was no group_by criteria on Customer field mentioned, it was just mentioned to assign the Groups to every row based on date and hence implemented it this way.. – Bhagyesh Dudhediya Nov 26 '21 at 09:19