Select row from a DataFrame based on the type of the object(i.e. str)

Question

So there's a DataFrame say:

>>> df = pd.DataFrame({
...                 'A':[1,2,'Three',4],
...                 'B':[1,'Two',3,4]})
>>> df
       A    B
0      1    1
1      2  Two
2  Three    3
3      4    4

I want to select the rows whose datatype of particular row of a particular column is of type str.

For example I want to select the row where type of data in the column A is a str. so it should print something like:

   A      B
2  Three  3

Whose intuitive code would be like:

df[type(df.A) == str]

Which obviously doesn't works!

Thanks please help!

score 54 · Accepted Answer · edited Jan 13 '20 at 15:39

54

This works:

df[df['A'].apply(lambda x: isinstance(x, str))]

edited Jan 13 '20 at 15:39

Martijn Pieters

1,048,767
296
4,058
3,343

answered Sep 01 '16 at 17:03

DrTRD

1,641
1
13
18

5

Don't use `type(obj) == typeobj`, ever. Use `isinstance(obj, typeobj)`, or if subclasses must be excluded, `type(obj) is typeobj` (identity testing, not equality). – Martijn Pieters Sep 24 '18 at 15:11

score 10 · Answer 2 · answered Sep 01 '16 at 15:36

10

You can do something similar to what you're asking with

In [14]: df[pd.to_numeric(df.A, errors='coerce').isnull()]
Out[14]: 
       A  B
2  Three  3

Why only similar? Because Pandas stores things in homogeneous columns (all entries in a column are of the same type). Even though you constructed the DataFrame from heterogeneous types, they are all made into columns each of the lowest common denominator:

In [16]: df.A.dtype
Out[16]: dtype('O')

Consequently, you can't ask which rows are of what type - they will all be of the same type. What you can do is to try to convert the entries to numbers, and check where the conversion failed (this is what the code above does).

answered Sep 01 '16 at 15:36

Ami Tavory

74,578
11
141
185

Thanks:) but what's with `isnull()` ? what does it return ? – Devi Prasad Khatua Sep 01 '16 at 15:46
1

@wolframalpha Given a Series, it returns a boolean series indicating which entries of the series had null values in them. So, first we use `to_numeric` (which places a null value when the conversion failed), then run `isnull` on the result. – Ami Tavory Sep 01 '16 at 15:48
1

I think this should be the correct answer since even if there is one string the whole column would be a string. The concoction is too simple of a scenario hence the accepted answer worked. In real life situations this is a live saver. – jar Apr 04 '20 at 09:52
this is the best and fastest solution (far better than applying a lambda); should be the accepted answer. – Pierre D Dec 14 '20 at 22:02

score 5 · Answer 3 · answered Sep 24 '18 at 15:17

It's generally a bad idea to use a series to hold mixed numeric and non-numeric types. This will cause your series to have dtype object, which is nothing more than a sequence of pointers. Much like list and, indeed, many operations on such series can be more efficiently processed with list.

With this disclaimer, you can use Boolean indexing via a list comprehension:

res = df[[isinstance(value, str) for value in df['A']]]

print(res)

       A  B
2  Three  3

The equivalent is possible with pd.Series.apply, but this is no more than a thinly veiled loop and may be slower than the list comprehension:

res = df[df['A'].apply(lambda x: isinstance(x, str))]

If you are certain all non-numeric values must be strings, then you can convert to numeric and look for nulls, i.e. values that cannot be converted:

res = df[pd.to_numeric(df['A'], errors='coerce').isnull()]

Select row from a DataFrame based on the type of the object(i.e. str)

3 Answers3

Linked

Related