I want to filter some columns in dataframe where there is little to no change in the data throughout, an example plot of one of the columns is shown below:
What I'm doing currently is quite simple and is probably very inefficient.
from collections import Counter
n = data2.shape[0]
for col in data2.columns:
most_freq = Counter(data2[col]).most_common(1)[0][1]
print(col, most_freq/n)
It's output:
m0 0.25192519251925194
m1 0.5808580858085809
m2 0.09790979097909791
m3 0.0033003300330033004
m4 0.9713971397139713
m5 1.0
m6 1.0
m7 1.0
m8 1.0
m9 0.9713971397139713
m10 1.0
m11 1.0
As you can see, I'd like to filter out the columns (like m5, m6 etc) which have high volume of constant non-changing values. Is there a better, perhaps some statistical way to do it? I've looked at a similar question but it didn't help much.
Update:
Based on @Kosmo's answer, this seemed to work well for me. At least, it helped me remove the obvious ones with flat lines.
data2 = data2.loc[:, (round(data2.var()) > 0)]