25

I have a pandas dataframe like this

df = pd.DataFrame(data=[[21, 1],[32, -4],[-4, 14],[3, 17],[-7,NaN]], columns=['a', 'b'])
df

I want to be able to remove all rows with negative values in a list of columns and conserving rows with NaN.

In my example there is only 2 columns, but I have more in my dataset, so I can't do it one by one.

dooms
  • 1,537
  • 3
  • 16
  • 30

3 Answers3

50

If you want to apply it to all columns, do df[df > 0] with dropna():

>>> df[df > 0].dropna()
    a   b
0  21   1
3   3  17

If you know what columns to apply it to, then do for only those cols with df[df[cols] > 0]:

>>> cols = ['b']
>>> df[cols] = df[df[cols] > 0][cols]
>>> df.dropna()
    a   b
0  21   1
2  -4  14
3   3  17
ComputerFellow
  • 11,710
  • 12
  • 50
  • 61
14

I've found you can simplify the answer by just doing this:

>>> cols = ['b']
>>> df = df[df[cols] > 0]

dropna() is not an in-place method, so you have to store the result.

>>> df = df.dropna()
rgahan
  • 667
  • 8
  • 17
  • This code results in an empty dataframe because column 'a' would be replaced by all NaNs because the filter doesn't include that column. I could modify this answer to include that but then it would be pretty much the same as the other answer. – Zev Apr 20 '20 at 18:48
1

I was looking for a solution for this that doesn't change the dtype (which will happen if NaN's are mixed in with ints as suggested in the answers that use dropna. Since the questioner already had a NaN in their data, that may not be an issue for them. I went with this solution which preserves the int64 dtype. Here it is with my sample data:

df = pd.DataFrame(data={'a':[0, 1, 2], 'b': [-1,0,1], 'c': [-2, -1, 0]})
columns = ['b', 'c']
filter_ = (df[columns] >= 0).all(axis=1)
df[filter_]


   a  b  c
2  2  1  0
Zev
  • 3,423
  • 1
  • 20
  • 41