Replace the unique values in a DataFrame column with their count

Question

I have a DataFrame such as this:

Index Label
0     ABCD
1     EFGH
2     ABCD
3     ABCD
4     EFGH
5     ABCD
6     IJKL
7     IJKL
8     ABCD
9     EFGH

So, "ABCD" occurs 5 times, "EFGH" 3 times and "IJKL" twice. I want to count the occurence of each Label and replace the individual labels with their count, to get the following:

Index Label
0     5
1     3
2     5
3     5
4     3
5     5
6     2
7     2
8     5
9     3

What is the nicest way to do this? Thank you!

score 3 · Accepted Answer · answered Sep 23 '17 at 20:10

3

Use map by Series created by value_counts:

df['Label'] = df['Label'].map(df['Label'].value_counts())
print (df)
   Label
0      5
1      3
2      5
3      5
4      3
5      5
6      2
7      2
8      5
9      3

Another solution with transform + size:

df['Label'] = df.groupby('Label')['Label'].transform('size')
print (df)

   Label
0      5
1      3
2      5
3      5
4      3
5      5
6      2
7      2
8      5
9      3

answered Sep 23 '17 at 20:10

jezrael

822,522
95
1,334
1,252

Are you sure? I think always is necessary `size`, and if need exclude `NaN`s need `count` (rarest using) – jezrael Sep 23 '17 at 20:21

cs95 · Answer 2 · 2017-09-23T20:22:18.340

2

Use groupby and transform:

print(df)
      Label
Index      
0      ABCD
1      EFGH
2      ABCD
3      ABCD
4      EFGH
5      ABCD
6      IJKL
7      IJKL
8      ABCD
9      EFGH

df['Label'] = df.groupby('Label').Label.transform('count')
print(df)
       Label
Index       
0          5
1          3
2          5
3          5
4          3
5          5
6          2
7          2
8          5
9          3

If your column does not have NaNs, size and count return the same values. Otherwise, size includes NaNs, so avoid using it.

Another way using Counter:

from collections import Counter

df['Label'] = df.Label.map(Counter(df.Label))
print(df)
       Label
Index       
0          5
1          3
2          5
3          5
4          3
5          5
6          2
7          2
8          5
9          3

edited Sep 23 '17 at 20:22

answered Sep 23 '17 at 20:12

cs95

379,657
97
704
746

@P.Prunesquallor Thanks for the upvote. – cs95 Sep 23 '17 at 20:16
1

@P.Prunesquallor Also, if you are using the groupby solution, don't use `size` as jezrael's solution has. – cs95 Sep 23 '17 at 20:22
I dont understand `Otherwise, size includes NaNs, so avoid using it.` Why avoid? I think both functions are nice - And I think function `count` is the best not used, only if need exclude NaNs explicitly. I think there is no reason for avoid using `size`, because is good if I know I have some NaNs (and I think no NaNs in data - especially if float data). – jezrael Sep 23 '17 at 20:31

Replace the unique values in a DataFrame column with their count

2 Answers2