1

Assume I have a following dataframe:

date          group         value
2022-11-01.     1              4
2022-11-02.     1              12
2022-11-03.     1              14
2022-11-04.     1              25
2021-11-01.     2              9
2021-11-02.     2              7
2019-10-01.     3              40
2022-10-02.     3              14

I want to create a new column that is increasing integer based on date for each group. For example, this is the desired output:

  date          group         value      new_col
    2022-11-01.     1              4.      0
    2022-11-02.     1              12.     1
    2022-11-03.     1              14.     2
    2022-11-04.     1              25.     3
    2021-11-01.     2              9.      0
    2021-11-02.     2              7.      1
    2019-10-01.     3              40.     0
    2022-10-02.     3              14.     1

You see new_col is like np,arange(0, len(df['date'])+1), however I want to do it per group and it seems no variation of groupby works for me.

I have tried:

df.groupby('group')['date'].apply(lambda x: np.arange(0, len(x)+1)

However this is not even close to what I want. I would appreciate if someone explains how to do this correctly.

user59419
  • 893
  • 8
  • 20
  • How is this duplicate, this is not even answering my question or remotely close to that. – user59419 May 18 '23 at 06:48
  • 2
    Or need `df.groupby('group')['date'].rank('dense').sub(1)` ? – jezrael May 18 '23 at 06:52
  • I do not think this question is duplicate at all, I am also interested how this is done. – stack May 18 '23 at 06:56
  • 1
    @jezrael, your solution is very neat. Do you know how to do it without using rank or only using the function OP used. I also do not think it is duplicate. It make no sense, people flag everything as duplicate – math May 18 '23 at 06:58
  • @jezrael, your solution works, thanks. You can add it as an answer. Is there any other way to do this problem using np.arange(0, len(x) + 1) and groupby – user59419 May 18 '23 at 07:02

1 Answers1

1

Is there any other way to do this problem using np.arange(0, len(x) + 1) and groupby?

I change data for see difference - GroupBy.rank working with order of column date, so different output with counter by GroupBy.cumcount and your solution with GroupBy.transform:

print (df)
        date  group  value
0 2022-11-08      1      4
1 2022-11-07      1     12
2 2022-11-03      1     14
3 2022-11-04      1     25
4 2021-11-21      2      9
5 2021-11-02      2      7
6 2019-10-01      3     40
7 2022-10-02      3     14

df['new_col'] = df.groupby('group')['date'].rank('dense').sub(1).astype(int)

df['new_col1'] = df.groupby('group').cumcount()

df['new_col2'] = df.groupby('group')['date'].transform(lambda x: np.arange(len(x)))
print (df)
        date  group  value  new_col  new_col1  new_col2
0 2022-11-08      1      4        3         0         0
1 2022-11-07      1     12        2         1         1
2 2022-11-03      1     14        0         2         2
3 2022-11-04      1     25        1         3         3
4 2021-11-21      2      9        1         0         0
5 2021-11-02      2      7        0         1         1
6 2019-10-01      3     40        0         0         0
7 2022-10-02      3     14        1         1         1

If want same ouput solution is sorting per both columns:

df = df.sort_values(['group','date'])

df['new_col'] = df.groupby('group')['date'].rank('dense').sub(1).astype(int)

df['new_col1'] = df.groupby('group').cumcount()

df['new_col2'] = df.groupby('group')['date'].transform(lambda x: np.arange(len(x)))
print (df)
        date  group  value  new_col  new_col1  new_col2
2 2022-11-03      1     14        0         0         0
3 2022-11-04      1     25        1         1         1
1 2022-11-07      1     12        2         2         2
0 2022-11-08      1      4        3         3         3
5 2021-11-02      2      7        0         0         0
4 2021-11-21      2      9        1         1         1
6 2019-10-01      3     40        0         0         0
7 2022-10-02      3     14        1         1         1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252