2

I have the following dataframe df, which has a bunch of observations that have a bunch of categories that they can belong to and whether they won or not. I subsetted some of the rows so there are actually more categories. I want to create a new dataframe where I remove any category columns (any column with prefix cat_) that don't correspond to any of the observations (i.e. the whole category column is 0).

id cat_food    cat_fitness   cat_retail    cat_grocery  win
1  1           0             1             0            1
2  1           0             0             0            0
3  0           1             0             0            1
4  1           0             0             0            1
4  1           0             0             0            0
5  1           0             0             0            1
6  0           1             1             0            1
6  0           1             1             0            0

my expected dataframe would have the column cat_grocery removed because none of the observations belong to that category

id cat_food    cat_fitness   cat_retail      win
1  1           0             1               1
2  1           0             0               0
3  0           1             0               1
4  1           0             0               1
4  1           0             0               0
5  1           0             0               1
6  0           1             1               1
6  0           1             1               0
Eisen
  • 1,697
  • 9
  • 27
  • 1
    Does this answer your question? [How do I delete a column that contains only zeros in Pandas?](https://stackoverflow.com/questions/21164910/how-do-i-delete-a-column-that-contains-only-zeros-in-pandas) – m13op22 Apr 01 '21 at 14:04
  • 1
    `df.loc[:, df.max()>0]` or `df.loc[:, df.any()]`? – Quang Hoang Apr 01 '21 at 14:07
  • I only want to delete columns with a `cat_` prefix that have all zeros, not any column – Eisen Apr 01 '21 at 14:12

1 Answers1

1

Based on your comment: I remove any category columns (any column with prefix cat_) that don't correspond to any of the observations (i.e. the whole category column is 0) , you can chain the conditions to check if sum is 0 and if column has a prefix cat_ using str.startswith , and then use it with loc:

df.loc[:,~(df.sum().eq(0) & df.columns.str.startswith("cat_"))]

   id  cat_food  cat_fitness  cat_retail  win
0   1         1            0           1    1
1   2         1            0           0    0
2   3         0            1           0    1
3   4         1            0           0    1
4   4         1            0           0    0
5   5         1            0           0    1
6   6         0            1           1    1
7   6         0            1           1    0
anky
  • 74,114
  • 11
  • 41
  • 70