5

I would like to drop a given column from a pandas dataframe IF all the values in the column is "0%".

my df:

data = {'UK': ['11%', '16%', '7%', '52%', '2%', '5%', '3%', '3%'],
        'US': ['0%', '0%', '0%', '0%', '0%', '0%', '0%', '0%'],
        'DE': ['11%', '16%', '7%', '52%', '2%', '5%', '3%', '3%'],
        'FR': ['11%', '16%', '7%', '52%', '2%', '5%', '3%', '3%']
        }
dummy_df = pd.DataFrame(data, 
                        index=    ['cat1','cat2','cat3','cat4','cat5','cat6','cat7','cat8'], 
                        columns=['UK', 'US', 'DE', 'FR'])

my code so far:

dummy_df.drop(dummy_df == '0%',inplace=True)

I get a value error:

ValueError: labels ['UK' 'US' 'DE' 'FR'] not contained in axis
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Boosted_d16
  • 13,340
  • 35
  • 98
  • 158
  • related question [Deleting DataFrame row in Pandas based on column value](http://stackoverflow.com/questions/18172851/deleting-dataframe-row-in-pandas-based-on-column-value) – farhawa May 20 '15 at 13:36

2 Answers2

11
In [186]: dummy_df.loc[:, ~(dummy_df == '0%').all()]
Out[186]:
       UK   DE   FR
cat1  11%  11%  11%
cat2  16%  16%  16%
cat3   7%   7%   7%
cat4  52%  52%  52%
cat5   2%   2%   2%
cat6   5%   5%   5%
cat7   3%   3%   3%
cat8   3%   3%   3%

Explanation:

The comparison with '0%' you already got, this gives the following dataframe:

In [182]: dummy_df == '0%'
Out[182]:
         UK    US     DE     FR
cat1  False  True  False  False
cat2  False  True  False  False
cat3  False  True  False  False
cat4  False  True  False  False
cat5  False  True  False  False
cat6  False  True  False  False
cat7  False  True  False  False
cat8  False  True  False  False

Now we want to know which columns has all Trues:

In [183]: (dummy_df == '0%').all()
Out[183]:
UK    False
US     True
DE    False
FR    False
dtype: bool

And finally, we can index with these boolean values (but taking the opposite with ~ as want don't want to select where this is True): dummy_df.loc[:, ~(dummy_df == '0%').all()].

Similarly, you can also do: dummy_df.loc[:, (dummy_df != '0%').any()] (selects columns where at least one value is not equal to '0%')

joris
  • 133,120
  • 36
  • 247
  • 202
3

First get the columns where all values != '0%'

In [163]: cols = (dummy_df != '0%').any()

In [164]: cols
Out[164]:
UK     True
US    False
DE     True
FR     True
dtype: bool

Then call only cols columns which are True

In [165]: dummy_df[cols[cols].index]
Out[165]:
       UK   DE   FR
cat1  11%  11%  11%
cat2  16%  16%  16%
cat3   7%   7%   7%
cat4  52%  52%  52%
cat5   2%   2%   2%
cat6   5%   5%   5%
cat7   3%   3%   3%
cat8   3%   3%   3%
Zero
  • 74,117
  • 18
  • 147
  • 154
  • 2
    Shoudn't the `all` be `any` (see my answer), as the OP wants to delete the columns where *all* values are 0% (so a column with any value that is not 0% should be included) – joris May 20 '15 at 13:47