2

I am looking to create a function that will accept a pandas dataframe and a specific value(to_drop) which will then remove any row containing the specified value.

For example if I have this dataframe:

d = {'Name': ['John', 'Bill', "Frank"], 'A' : [1, 5, 7], 'B': [2, 0, 6], 'C' : [3, 1, 9]}
df = pd.DataFrame(d)

If the specific value I choose is 0, the function should remove Bill's row returning the rows of John and Frank.

I am trying to use:

def drop_row(df, to_drop):
    new_df = df[df.column != to_drop]
    return new_df

This is resulting in an attribute error, which I assumed it would because this only works when you are choosing a specific column.

Thank you!

  • Does this answer your question? [Drop rows by index from dataframe](https://stackoverflow.com/questions/47932937/drop-rows-by-index-from-dataframe) – sushanth Feb 07 '21 at 17:52

3 Answers3

4

Use pandas.DataFrame.any or pandas.DataFrame.all along axis=1 on the condition:

>>> df[df.ne(0).all(1)]
    Name  A  B  C
0   John  1  2  3
2  Frank  7  6  9

>>> df[~df.eq(0).any(1)]
    Name  A  B  C
0   John  1  2  3
2  Frank  7  6  9

You can make a function out of this, but frankly, it's unnnecessary:

>>> drop_row = lambda df: df[~df.eq(0).any(1)]
>>> drop_row(df)
    Name  A  B  C
0   John  1  2  3
2  Frank  7  6  9

It checks for the condition:

>>> df.ne(0) # items (n)ot (e)qual to 0:
   Name     A      B     C
0  True  True   True  True
1  True  True  False  True
2  True  True   True  True

>>> df.ne(0).all(1)  # checks if all values along axis 1 are True
0     True
1    False
2     True
dtype: bool

>>> df[df.ne(0).all(1)]  # Returns only index where values is True (i.e. 0, 2)
    Name  A  B  C
0   John  1  2  3
2  Frank  7  6  9
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52
  • 1
    Just a thought if OP had columns with other dtypes, use `df.select_dtypes(include=np.number)` for comparison then. – anky Feb 07 '21 at 18:12
2

You need to learn the tools to implement the logic you already have. The missing piece is the any and all functions. Look up how to iterate over selected columns of a DF. Put that into a list comprehension expression. Then apply any to that. The filtering syntax (as opposed to using the drop method) will look something like

df[ all( [df.column != to_drop for column ...] ) ]

I'll leave the iteration syntax up to your research.

Prune
  • 76,765
  • 14
  • 60
  • 81
1

Define your function as:

def drop_row(df, to_drop):
    return df[~df.eq(to_drop).any(axis=1)]

Then if you call e.g. drop_row(df, 1), you will get:

    Name  A  B  C
2  Frank  7  6  9

i.e. row with index == 0 and 1, both containing 1 in any column, are dropped.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41