2

What is the difference between these two methods to delete a row if the string 'something' is found in the column 'search'?

First method:

mydata = mydata.set_index("search")
mydata = mydata.drop("something", axis=0)

This method seems pretty straight forward and is understandable.

Second method:

mydata = mydata[~mydata.select_dtypes(['object']).eq('something').any(1)]

I don't really know how this method works. Where in this line is it specified to drop/delete the row? And why does it work with 'object' instead of 'search'? What does the "~" stand for? I just can't find it in the documentation.

TAN-C-F-OK
  • 179
  • 1
  • 15
  • I think I got it - more or less. "select_dtypes" searches for all rows with the string 'something' in the column and keeps them. The "~" reverses this statement. – TAN-C-F-OK Nov 01 '18 at 10:03
  • 1
    No, that's incorrect, `select_dtypes` subsets your dataframe by series/column **type**. The subsequent method `eq` is the one that tests for equality. – jpp Nov 01 '18 at 10:04

1 Answers1

1

Your two methods are not identical. Let's look at the second method in parts.

Step 1: subset dataframe via select_dtypes

mydata.select_dtypes(['object']) filters your dataframe for only series with object dtype. You can extract the dtype of each series via mydata.dtypes. Typically, non-numeric series will have object dtype, which indicates a sequence of pointers, similar to list.

In this case, your two methods only align when series search is the only object dtype series.

Step 2: Test for equality via eq

Since Step 1 returns a dataframe, even if it only contains one series, pd.DataFrame.eq will return a dataframe of Boolean values.

Step 3: Test for any True value row-wise via any

Next your second method checks if any value is True row-wise (axis=1). Again, if your only object series is search, then this equates to the same as your first method.

If you have multiple object series, then your two methods may not align, as a row may be excluded due to another series being equal to 'something'.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thanks. The big missing piece was that 'object' is a data type (str). Others would be 'int', 'float' and 'bool'. And just a little bonus question: 'object' can be mixed type? – TAN-C-F-OK Nov 01 '18 at 10:14
  • 1
    Yes, see [this excellent answer](https://stackoverflow.com/a/21020411/9209546) to understand what `object` really means. – jpp Nov 01 '18 at 10:15