How to label each group with df.groupby() in Python pandas?

Question

Note: this question can be associated with one existing question here. However, my question provides a more concrete example and has broader impact.

Consider we have a pandas data frame as following:

   Questions  cnt similarity
0       ABC    1  [1, 2, 3]
1       abc    2  [1, 2, 3]
2       cba    3  [2, 3, 1]
3      abcd    4  [4, 5, 6]
4      dcsa    5  [2, 3, 1]
5      adcd    6  [4, 5, 6]
6      abcd    7  [1, 2, 3]
7       cba    8  [7, 8, 9]

I have to add another column called cat based on the similarity column. If two rows have the same similarity, then categorize them as the same group. Below is the expected output. Any input is valuable. It is worth mentioning that the original dataset has 1M rows. Thank you.

  Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4

score 3 · Accepted Answer · answered Jun 09 '23 at 00:50

3

IIUC, you can use pd.factorize :

df["cat"] = pd.factorize(df["similarity"].astype(str))[0] + 1

Output :

print(df)

  Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4

answered Jun 09 '23 at 00:50

Timeless

22,580
4
12
30

Wonderful solution. Thank you. – Sophia Jun 09 '23 at 01:00

score 2 · Answer 2 · answered Jun 09 '23 at 00:57

One way is to use groupby.ngroup():

df['cat'] = df.groupby('similarity').ngroup()+1

  Questions  cnt similarity  cat
0       ABC    1  [1, 2, 3]    1
1       abc    2  [1, 2, 3]    1
2       cba    3  [2, 3, 1]    2
3      abcd    4  [4, 5, 6]    3
4      dcsa    5  [2, 3, 1]    2
5      adcd    6  [4, 5, 6]    3
6      abcd    7  [1, 2, 3]    1
7       cba    8  [7, 8, 9]    4

How to label each group with df.groupby() in Python pandas?

2 Answers2