0

Given the following dataframe:

df = pd.DataFrame({"values": ["a", "a", "a", "b", "b", "a", "a", "c"]})

How could I generate the given output:

  values  out
0      a    0
1      a    1
2      a    2
3      b    0
4      b    1
5      a    0
6      a    1
7      c    0

I can (if it allows easier options) ensure uniqueness over groups, hence having input values like:

df = pd.DataFrame({"values": ["a0", "a0", "a0", "b0", "b0", "a1", "a1", "c0"]})
ohe
  • 3,461
  • 3
  • 26
  • 50
  • Possible duplicate of [Pandas: conditional rolling count](https://stackoverflow.com/questions/25119524/pandas-conditional-rolling-count) – mad_ Mar 18 '19 at 17:34

1 Answers1

1

Using shift and cumsum create the key , then we using category

df['strkey']=(df['values']!=df['values'].shift()).ne(0).cumsum()

df['values']+=df.groupby('values')['strkey'].apply(lambda x : x.astype('category').cat.codes.astype(str))
df
Out[568]: 
  values  strkey
0     a0       1
1     a0       1
2     a0       1
3     b0       2
4     b0       2
5     a1       3
6     a1       3
7     c0       4
BENY
  • 317,841
  • 20
  • 164
  • 234