0

I want to group by age and name and assign different poeple in a same category.

Initial Data:

name age salary 
abc   24  1000    
def   27  2000    
ghi   25  3000    
jkl   24  1000    
mno   25  3000 

Final Data:

name age salary group
abc   24  1000    1
def   27  2000    2
ghi   25  3000    3
jkl   24  1000    1
mno   25  3000    3
eyllanesc
  • 235,170
  • 19
  • 170
  • 241

2 Answers2

1

Use factorize with list of tuples created by both columns:

df['group'] = pd.factorize(list(zip(df['age'],df['salary'])))[0] + 1
print (df)
  name  age  salary  group
0  abc   24    1000      1
1  def   27    2000      2
2  ghi   25    3000      3
3  jkl   24    1000      1
4  mno   25    3000      3

Or:

df['group'] = pd.factorize(list(map(tuple, df[['age','salary']].values.tolist())))[0] + 1
print (df)
  name  age  salary  group
0  abc   24    1000      1
1  def   27    2000      2
2  ghi   25    3000      3
3  jkl   24    1000      1
4  mno   25    3000      3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

you can use factorize to transform your categories into integer identifiers.

Assign data to df, then use the following code.

# concat age and salary
fact = df.age.astype(str).str.cat(df.salary.astype(str))
# then use factorize
df['group'] = pd.factorize(fact)[0] + 1

Output:

  name age  salary  group
0  abc  24    1000      1
1  def  27    2000      2
2  ghi  25    3000      3
3  jkl  24    1000      1
4  mno  25    3000      3
Hsgao
  • 553
  • 5
  • 18
  • But I need to group by age as well as salary and then assign group, i.e. age(24) and salary(4000) should go in group 4 not in group 1 – SOUMYABRATA RAKSHIT Sep 28 '18 at 08:57
  • well, your example doesn't show up. But i think just combine age and salary into a string or list, then use factorize will work. – Hsgao Sep 28 '18 at 09:24