1

started learning Python for data science a few weeks ago, and ran into this problem for my own project. I'm trying to replace gaming publisher name to "Other" if the count is below 5. When I use .mask() function, however, it seems to also replace the "Count" value to "Other" as well. Any possible way to just change the "Publisher" value to "Other" and keep the "Count" value as it is?

The method I tried is below:

publisher_subset = data.filter(['Publisher'])
df = publisher_subset.value_counts().reset_index(name='Counts')
df.mask(df["Counts"] <= 5, "Other", inplace=False)
```
[enter image description here](https://i.stack.imgur.com/kOzFY.png)`
Corralien
  • 109,409
  • 8
  • 28
  • 52
Yune
  • 13
  • 2
  • Here's a similar question https://stackoverflow.com/questions/31511997/pandas-dataframe-replace-all-values-in-a-column-based-on-condition – shounak shastri Mar 27 '23 at 09:52

3 Answers3

2

You are looking for np.where:

import numpy as np    
df['Publisher'] = np.where(df["Counts"] <= 5, "Other", df['Publisher'])
Alessandro
  • 361
  • 1
  • 9
0

you can use the .loc[] indexer to selectively apply the mask to the Publisher column only.

publisher_subset = data.filter(['Publisher'])
df = publisher_subset.value_counts().reset_index(name='Counts')

df.loc[df["Counts"] <= 5, "Publisher"] = "Other"
Abdulmajeed
  • 1,502
  • 2
  • 10
  • 13
0

You can use mask if you select only the Publisher column:

# Select publisher column only to mask values
rename_others = lambda x: x['Publisher'].mask(x['counts'] <= 5, other='Others')

out = (df.value_counts('Publisher').reset_index(name='counts')
         .assign(Publisher=rename_others))
print(out)

# Output
  Publisher  counts
0      Sony       7
1    Bandai       6
2    Others       5
3    Others       5
4    Others       5
5    Others       4
6    Others       4

I suppose you also want to sum by Publisher:

out = (df.value_counts('Publisher').reset_index(name='counts')
         .assign(Publisher=rename_others)
         .groupby('Publisher', sort=False, as_index=False).sum())

  Publisher  counts
0      Sony       7
1    Bandai       6
2    Others      23
Corralien
  • 109,409
  • 8
  • 28
  • 52