How do I delete a column that contains only zeros in Pandas?

Question

I currently have a dataframe consisting of columns with 1's and 0's as values, I would like to iterate through the columns and delete the ones that are made up of only 0's. Here's what I have tried so far:

ones = []
zeros = []
for year in years:
    for i in range(0,599):
        if year[str(i)].values.any() == 1:
            ones.append(i)
        if year[str(i)].values.all() == 0:
            zeros.append(i)
    for j in ones:
        if j in zeros:
            zeros.remove(j)
    for q in zeros:
        del year[str(q)]

In which years is a list of dataframes for the various years I am analyzing, ones consists of columns with a one in them and zeros is a list of columns containing all zeros. Is there a better way to delete a column based on a condition? For some reason I have to check whether the ones columns are in the zeros list as well and remove them from the zeros list to obtain a list of all the zero columns.

Possible duplicate of [Deleting DataFrame row in Pandas based on column value](https://stackoverflow.com/q/18172851/1278112) — Shihe Zhang, Nov 06 '17 at 05:34
I disagree. That question is to remove rows based on values in one column. Here multiple columns are to be removed based on their own values. — Edward, Feb 09 '21 at 09:35

unutbu · Accepted Answer · 2016-01-03T15:47:11.047

309

df.loc[:, (df != 0).any(axis=0)]

Here is a break-down of how it works:

In [74]: import pandas as pd

In [75]: df = pd.DataFrame([[1,0,0,0], [0,0,1,0]])

In [76]: df
Out[76]: 
   0  1  2  3
0  1  0  0  0
1  0  0  1  0

[2 rows x 4 columns]

df != 0 creates a boolean DataFrame which is True where df is nonzero:

In [77]: df != 0
Out[77]: 
       0      1      2      3
0   True  False  False  False
1  False  False   True  False

[2 rows x 4 columns]

(df != 0).any(axis=0) returns a boolean Series indicating which columns have nonzero entries. (The any operation aggregates values along the 0-axis -- i.e. along the rows -- into a single boolean value. Hence the result is one boolean value for each column.)

In [78]: (df != 0).any(axis=0)
Out[78]: 
0     True
1    False
2     True
3    False
dtype: bool

And df.loc can be used to select those columns:

In [79]: df.loc[:, (df != 0).any(axis=0)]
Out[79]: 
   0  2
0  1  0
1  0  1

[2 rows x 2 columns]

To "delete" the zero-columns, reassign df:

df = df.loc[:, (df != 0).any(axis=0)]

edited Jan 03 '16 at 15:47

answered Jan 16 '14 at 14:52

unutbu

842,883
184
1,785
1,677

I am trying this to drop a column if it has either 0 or 1 in it and it gives an error: df = df.loc[:, (df != 0 & df != 1).any(axis=0)] – morpheus Nov 06 '16 at 17:18
this worked (in case it helps someone): df[df.columns[(~df.isin([0,1])).any(axis=0)]] – morpheus Nov 06 '16 at 17:46
1

`df.loc[:, (~df.isin([0,1])).any(axis=0)]` would also work. – unutbu Nov 06 '16 at 20:58
why does df != 0 & df != 1 not work? I also tried doing it step by step like a = df[df!=0] b = df[df!=1] c = a & b (elementwise and) and it complains – morpheus Nov 07 '16 at 18:20
`(df != 0) & (df != 1)` is a DataFrame of boolean values. To select columns, you need to reduce that DataFrame to a 1D Series or array of boolean values, such as `((df != 0) & (df != 1)).any(axis=0)`. Then you can select columns using `df.loc[:, ((df != 0) & (df != 1)).any(axis=0)]`. – unutbu Nov 07 '16 at 20:02
the code (df != 0) & (df != 1) does not work in python. try it yourself – morpheus Nov 07 '16 at 21:47
`df = pd.DataFrame([[1,0,0,0], [0,0,1,0]])`; `(df != 0) & (df != 1)` returns a boolean DataFrame. If there is a problem, it has something to do with the particular DataFrame you are using -- but I'm having trouble guessing what that problem might be. – unutbu Nov 07 '16 at 22:00
Please post a new question, as long conversations in the comments not conducive to stackoverflow's goal of building a database of good question/answer pairs. – unutbu Nov 07 '16 at 22:06
Why not just `df = df.loc[:, df.any(axis=0)]`? – Dr Fabio Gori Aug 09 '18 at 13:40
1

@IgorFobia: Lot's of things are False-ish without being 0. For instance, empty strings or None or NaN. To demonstrate the difference, if `df = pd.DataFrame([[np.nan]*10])`, then `df.loc[:, df.any(axis=0)]` returns an empty DataFrame, while `df.loc[:, (df != 0).any(axis=0)]` returns a DataFrame with 10 columns. – unutbu Aug 09 '18 at 20:31
8

I believe it is easier to understand if we check for a condition being true, instead of checking if condition not being true is never not satisfied. I think `(df == 0).all(axis=0)` is more straightforward. – Ryszard Cetnarski Sep 18 '18 at 07:54
2

Thanks for the breakdown. It made things very clear. – Regi Mathew Nov 27 '19 at 01:29
@RyszardCetnarski `df = df.loc[:, (df != 0).any(axis=0)]` is **not** equivalent to `df = df.loc[:, (df == 0).all(axis=0)]`. Your proposal deletes the non-zero columns, which is the opposite of what we want. What you're looking for is `df = df.loc[:, ~(df == 0).all(axis=0)]`. – Nermin Mar 02 '23 at 13:24

score 12 · Answer 2 · answered Oct 17 '18 at 10:05

Here is an alternative way to use is

df.replace(0,np.nan).dropna(axis=1,how="all")

Compared with the solution of unutbu, this way is obviously slower:

%timeit df.loc[:, (df != 0).any(axis=0)]
652 µs ± 5.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df.replace(0,np.nan).dropna(axis=1,how="all")
1.75 ms ± 9.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

score 2 · Answer 3 · answered Mar 21 '19 at 11:20

In case you'd like a more expressive way of getting the zero-column names so you can print / log them, and drop them, in-place, by their names:

zero_cols = [ col for col, is_zero in ((df == 0).sum() == df.shape[0]).items() if is_zero ]
df.drop(zero_cols, axis=1, inplace=True)

Some break down:

# a pandas Series with {col: is_zero} items
# is_zero is True when the number of zero items in that column == num_all_rows
(df == 0).sum() == df.shape[0])

# a list comprehension of zero_col_names is built from the_series
[ col for col, is_zero in the_series.items() if is_zero ]

cyrilb38 · Answer 4 · 2021-02-09T10:01:05.250

2

In case there are some NaN values in your columns, you may want to use this approach if you want to remove columns that have both 0 and NaN :

df.loc[:, (df**2).sum() != 0]

edited Feb 09 '21 at 10:01

answered Jan 06 '21 at 08:26

cyrilb38

924
6
17

2

What if the sum happens to be 0? Maybe use mean square. – Edward Feb 09 '21 at 09:28
You are absolutely right, I did not think about these edge cases. I'm updating the answer. – cyrilb38 Feb 09 '21 at 10:00

Amanda · Answer 5 · 2023-03-21T17:50:34.613

2

This should do the work:

zero_cols = df.columns[(df == 0).all()]
df.drop(labels=zero_cols, axis=1, inplace=True)

edited Mar 21 '23 at 17:50

answered Mar 21 '23 at 17:46

Amanda

21
2

How do I delete a column that contains only zeros in Pandas?

5 Answers5

Linked

Related