1

I would like to create a column that generates a unique id that is matched with two of my columns in my data frame.

Here is below is the example dataframe:

df = pd.DataFrame({'month': [1, 2, 1, 3,4,5], 'brand': [76, 76, Arco, Shell, Arco, Cheveron],'address': [aa, aa, ab, bc, cd,de]})

I want a index that matches both the brand and address but not the month

df = pd.DataFrame({ 'id':[1,1,2,3,4,5] 'month': [1, 2, 1, 3,4,5], 'brand': [76, 76, Arco, Shell, Arco, Cheveron],'address': [aa, aa, ab, bc, cd,de]})
Kaung Myat
  • 107
  • 7

1 Answers1

0

Use DataFrame.insert with GroupBy.ngroup:

df.insert(0, 'id', df.groupby(['brand','address'], sort=False)['month'].ngroup() + 1)
print (df)
   id  month     brand address
0   1      1        76      aa
1   1      2        76      aa
2   2      1      Arco      ab
3   3      3     Shell      bc
4   4      4      Arco      cd
5   5      5  Cheveron      de

Or join columns together and use factorize:

s = df['brand'].astype(str) + '-' + df['address'].astype(str)
df.insert(0, 'id', pd.factorize(s)[0] + 1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252