How can I replace the values from certain columns in a pandas.DataFrame that occur rarely, i.e. with low frequency (while ignoring NaNs)?
For example, in the following dataframe, suppose I wanted to replace any values in columns A or B that occur less than three times in their respective column. I want to replace these with "other":
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':['ant','ant','cherry', pd.np.nan, 'ant'], 'B':['cat','peach', 'cat', 'cat', 'peach'], 'C':['dog','dog',pd.np.nan, 'emu', 'emu']})
df
A | B | C |
----------------------
ant | cat | dog |
ant | peach | dog |
cherry | cat | NaN |
NaN | cat | emu |
ant | peach | emu |
In other words, in columns A and B, I want to replace those values that occur twice or less (but leave NaNs alone).
So the output I want is:
A | B | C |
----------------------
ant | cat | dog |
ant | other | dog |
other | cat | NaN |
NaN | cat | emu |
ant | other | emu |
This is related to a previously posted question: Remove low frequency values from pandas.dataframe
but the solution there resulted in an "AttributeError: 'NoneType' object has no attribute 'any.'" (I think because I have NaN values?)