python pandas: drop a df column if condition

Question

I would like to drop a given column from a pandas dataframe IF all the values in the column is "0%".

my df:

data = {'UK': ['11%', '16%', '7%', '52%', '2%', '5%', '3%', '3%'],
        'US': ['0%', '0%', '0%', '0%', '0%', '0%', '0%', '0%'],
        'DE': ['11%', '16%', '7%', '52%', '2%', '5%', '3%', '3%'],
        'FR': ['11%', '16%', '7%', '52%', '2%', '5%', '3%', '3%']
        }
dummy_df = pd.DataFrame(data, 
                        index=    ['cat1','cat2','cat3','cat4','cat5','cat6','cat7','cat8'], 
                        columns=['UK', 'US', 'DE', 'FR'])

my code so far:

dummy_df.drop(dummy_df == '0%',inplace=True)

I get a value error:

ValueError: labels ['UK' 'US' 'DE' 'FR'] not contained in axis

related question [Deleting DataFrame row in Pandas based on column value](http://stackoverflow.com/questions/18172851/deleting-dataframe-row-in-pandas-based-on-column-value) — farhawa, May 20 '15 at 13:36

joris · Accepted Answer · 2015-05-20T13:46:21.997

In [186]: dummy_df.loc[:, ~(dummy_df == '0%').all()]
Out[186]:
       UK   DE   FR
cat1  11%  11%  11%
cat2  16%  16%  16%
cat3   7%   7%   7%
cat4  52%  52%  52%
cat5   2%   2%   2%
cat6   5%   5%   5%
cat7   3%   3%   3%
cat8   3%   3%   3%

Explanation:

The comparison with '0%' you already got, this gives the following dataframe:

In [182]: dummy_df == '0%'
Out[182]:
         UK    US     DE     FR
cat1  False  True  False  False
cat2  False  True  False  False
cat3  False  True  False  False
cat4  False  True  False  False
cat5  False  True  False  False
cat6  False  True  False  False
cat7  False  True  False  False
cat8  False  True  False  False

Now we want to know which columns has all Trues:

In [183]: (dummy_df == '0%').all()
Out[183]:
UK    False
US     True
DE    False
FR    False
dtype: bool

And finally, we can index with these boolean values (but taking the opposite with ~ as want don't want to select where this is True): dummy_df.loc[:, ~(dummy_df == '0%').all()].

Similarly, you can also do: dummy_df.loc[:, (dummy_df != '0%').any()] (selects columns where at least one value is not equal to '0%')

Zero · Answer 2 · 2015-05-20T13:56:23.200

3

First get the columns where all values != '0%'

In [163]: cols = (dummy_df != '0%').any()

In [164]: cols
Out[164]:
UK     True
US    False
DE     True
FR     True
dtype: bool

Then call only cols columns which are True

In [165]: dummy_df[cols[cols].index]
Out[165]:
       UK   DE   FR
cat1  11%  11%  11%
cat2  16%  16%  16%
cat3   7%   7%   7%
cat4  52%  52%  52%
cat5   2%   2%   2%
cat6   5%   5%   5%
cat7   3%   3%   3%
cat8   3%   3%   3%

edited May 20 '15 at 13:56

answered May 20 '15 at 13:43

Zero

74,117
18
147
154

2

Shoudn't the `all` be `any` (see my answer), as the OP wants to delete the columns where *all* values are 0% (so a column with any value that is not 0% should be included) – joris May 20 '15 at 13:47

python pandas: drop a df column if condition

2 Answers2