-1

I want to get in a new dataframe the rows of an original dataframe where there is a non-real (i.e. string) value in a specific column.

import pandas as pd
import numpy as np
test = {'a':[1,2,3],
        'b':[4,5,'x'],
        'c':['f','g','h']}
df_test = pd.DataFrame(test)
print(df_test)

I want to get the third row where the value in 'b' column is not numeric (it is 'x').

ChristosK
  • 25
  • 4
  • sorry, I don't understand your problem from your English, can you clarify more about what you want and what your problem is. – JayPeerachai Dec 02 '22 at 16:00
  • 1
    Does this answer your question? [Finding non-numeric rows in dataframe in pandas?](https://stackoverflow.com/questions/21771133/finding-non-numeric-rows-in-dataframe-in-pandas) – Nick ODell Dec 02 '22 at 16:08
  • Have a look at this one: https://stackoverflow.com/questions/46999146/in-pandas-how-to-filter-a-series-based-on-the-type-of-the-values – alkes Dec 02 '22 at 16:12
  • The topic "Finding non-numeric rows in dataframe in pandas?" does not solve my question. I need to get the non numeric values from specific columns instead from the whole dataframe. – ChristosK Dec 02 '22 at 16:17

2 Answers2

0

The complication is that Pandas forces column elements to have the same type (object for mixed str and int) so simple selection is not possible. Hence I think it is necessary to iterate over the column of interest to select the row(s) and then extract that/those.

mask = []
for j in df_test['b']:
    if isinstance(j, str):
        mask.append(True)
    else:
        mask.append(False)
        
print(df_test[mask])

which produces

   a  b  c
2  3  x  h
user19077881
  • 3,643
  • 2
  • 3
  • 14
0

You'll need to perform some type of list comprehension or element-wise apply and build a boolean mask for this type of problem. You can use any of the following approaches (you should see similar performance for all).

isinstance .apply

mask = df_test['b'].apply(isinstance, args=(str, ))

print(df_test.loc[mask])
   a  b  c
2  3  x  h

isinstance list comprehension

mask = [isinstance(v, str) for v in df_test['b']]

print(df_test.loc[mask])
   a  b  c
2  3  x  h

coerce to numeric and find nans

mask = pd.to_numeric(df_test['b'], errors='coerce').isna()

print(df_test.loc[mask])
   a  b  c
2  3  x  h

Cameron Riddell
  • 10,942
  • 9
  • 19