2

I have a DataFrame such as this:

Index Label
0     ABCD
1     EFGH
2     ABCD
3     ABCD
4     EFGH
5     ABCD
6     IJKL
7     IJKL
8     ABCD
9     EFGH

So, "ABCD" occurs 5 times, "EFGH" 3 times and "IJKL" twice. I want to count the occurence of each Label and replace the individual labels with their count, to get the following:

Index Label
0     5
1     3
2     5
3     5
4     3
5     5
6     2
7     2
8     5
9     3

What is the nicest way to do this? Thank you!

cs95
  • 379,657
  • 97
  • 704
  • 746
P. Prunesquallor
  • 561
  • 1
  • 10
  • 26

2 Answers2

3

Use map by Series created by value_counts:

df['Label'] = df['Label'].map(df['Label'].value_counts())
print (df)
   Label
0      5
1      3
2      5
3      5
4      3
5      5
6      2
7      2
8      5
9      3

Another solution with transform + size:

df['Label'] = df.groupby('Label')['Label'].transform('size')
print (df)

   Label
0      5
1      3
2      5
3      5
4      3
5      5
6      2
7      2
8      5
9      3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Are you sure? I think always is necessary `size`, and if need exclude `NaN`s need `count` (rarest using) – jezrael Sep 23 '17 at 20:21
2

Use groupby and transform:

print(df)
      Label
Index      
0      ABCD
1      EFGH
2      ABCD
3      ABCD
4      EFGH
5      ABCD
6      IJKL
7      IJKL
8      ABCD
9      EFGH

df['Label'] = df.groupby('Label').Label.transform('count')
print(df)
       Label
Index       
0          5
1          3
2          5
3          5
4          3
5          5
6          2
7          2
8          5
9          3

If your column does not have NaNs, size and count return the same values. Otherwise, size includes NaNs, so avoid using it.


Another way using Counter:

from collections import Counter

df['Label'] = df.Label.map(Counter(df.Label))
print(df)
       Label
Index       
0          5
1          3
2          5
3          5
4          3
5          5
6          2
7          2
8          5
9          3
cs95
  • 379,657
  • 97
  • 704
  • 746
  • @P.Prunesquallor Thanks for the upvote. – cs95 Sep 23 '17 at 20:16
  • 1
    @P.Prunesquallor Also, if you are using the groupby solution, don't use `size` as jezrael's solution has. – cs95 Sep 23 '17 at 20:22
  • I dont understand `Otherwise, size includes NaNs, so avoid using it.` Why avoid? I think both functions are nice - And I think function `count` is the best not used, only if need exclude NaNs explicitly. I think there is no reason for avoid using `size`, because is good if I know I have some NaNs (and I think no NaNs in data - especially if float data). – jezrael Sep 23 '17 at 20:31