33

So there's a DataFrame say:

>>> df = pd.DataFrame({
...                 'A':[1,2,'Three',4],
...                 'B':[1,'Two',3,4]})
>>> df
       A    B
0      1    1
1      2  Two
2  Three    3
3      4    4

I want to select the rows whose datatype of particular row of a particular column is of type str.

For example I want to select the row where type of data in the column A is a str. so it should print something like:

   A      B
2  Three  3

Whose intuitive code would be like:

df[type(df.A) == str]

Which obviously doesn't works!

Thanks please help!

Devi Prasad Khatua
  • 1,185
  • 3
  • 11
  • 23

3 Answers3

54

This works:

df[df['A'].apply(lambda x: isinstance(x, str))]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
DrTRD
  • 1,641
  • 1
  • 13
  • 18
  • 5
    Don't use `type(obj) == typeobj`, ever. Use `isinstance(obj, typeobj)`, or if subclasses must be excluded, `type(obj) is typeobj` (identity testing, not equality). – Martijn Pieters Sep 24 '18 at 15:11
10

You can do something similar to what you're asking with

In [14]: df[pd.to_numeric(df.A, errors='coerce').isnull()]
Out[14]: 
       A  B
2  Three  3

Why only similar? Because Pandas stores things in homogeneous columns (all entries in a column are of the same type). Even though you constructed the DataFrame from heterogeneous types, they are all made into columns each of the lowest common denominator:

In [16]: df.A.dtype
Out[16]: dtype('O')

Consequently, you can't ask which rows are of what type - they will all be of the same type. What you can do is to try to convert the entries to numbers, and check where the conversion failed (this is what the code above does).

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • Thanks:) but what's with `isnull()` ? what does it return ? – Devi Prasad Khatua Sep 01 '16 at 15:46
  • 1
    @wolframalpha Given a Series, it returns a boolean series indicating which entries of the series had null values in them. So, first we use `to_numeric` (which places a null value when the conversion failed), then run `isnull` on the result. – Ami Tavory Sep 01 '16 at 15:48
  • 1
    I think this should be the correct answer since even if there is one string the whole column would be a string. The concoction is too simple of a scenario hence the accepted answer worked. In real life situations this is a live saver. – jar Apr 04 '20 at 09:52
  • this is the best and fastest solution (far better than applying a lambda); should be the accepted answer. – Pierre D Dec 14 '20 at 22:02
5

It's generally a bad idea to use a series to hold mixed numeric and non-numeric types. This will cause your series to have dtype object, which is nothing more than a sequence of pointers. Much like list and, indeed, many operations on such series can be more efficiently processed with list.

With this disclaimer, you can use Boolean indexing via a list comprehension:

res = df[[isinstance(value, str) for value in df['A']]]

print(res)

       A  B
2  Three  3

The equivalent is possible with pd.Series.apply, but this is no more than a thinly veiled loop and may be slower than the list comprehension:

res = df[df['A'].apply(lambda x: isinstance(x, str))]

If you are certain all non-numeric values must be strings, then you can convert to numeric and look for nulls, i.e. values that cannot be converted:

res = df[pd.to_numeric(df['A'], errors='coerce').isnull()]
jpp
  • 159,742
  • 34
  • 281
  • 339