6

trying to teach myself pandas.. and playing around with different dtypes

I have a df as follows

df = pd.DataFrame({'ID':[0,2,"bike","cake"], 'Course':['Test','Math','Store','History'] })
print(df)
    ID  Course
0   0   Test
1   2   Math
2   bike    Store
3   cake    History

the dtype of ID is of course an object. What I want to do is remove any rows in the DF if the ID has a string in it.

I thought this would be as simple as..

df.ID.filter(regex='[\w]*')

but this returns everything, is there a sure fire method for dealing with such things?

pault
  • 41,343
  • 15
  • 107
  • 149
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • Related / possible duplicate: [Remove rows where column value type is string Pandas](https://stackoverflow.com/questions/26771471/remove-rows-where-column-value-type-is-string-pandas) – jpp Sep 07 '18 at 22:10

3 Answers3

6

You can using to_numeric

df[pd.to_numeric(df.ID,errors='coerce').notnull()]
Out[450]: 
  Course ID
0   Test  0
1   Math  2
BENY
  • 317,841
  • 20
  • 164
  • 234
5

Another option is to convert the column to string and use str.match:

print(df[df['ID'].astype(str).str.match("\d+")])
#  Course ID
#0   Test  0
#1   Math  2

Your code does not work, because as stated in the docs for pandas.DataFrame.filter:

Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.

pault
  • 41,343
  • 15
  • 107
  • 149
5

Wen's answer is the correct (and fastest) way to solve this, but to explain why your regular expression doesn't work, you have to understand what \w means.

\w matches any word character, which includes [a-zA-Z0-9_]. So what you're currently matching includes digits, so everything is matched. A valid regular expression approach would be:

df.loc[df.ID.astype(str).str.match(r'\d+')]

  ID Course
0  0   Test
1  2   Math

The second issue is your use of filter. It isn't filtering your ID row, it is filtering your index. A valid solution using filter would be as follows:

df.set_index('ID').filter(regex=r'^\d+$', axis=0)

   Course
ID
0    Test
2    Math
user3483203
  • 50,081
  • 9
  • 65
  • 94
  • Thanks! all the solutions work for me, I'm not really sure which one to green tick..? what would you recommend my good sir? the /w was extremely informative I didn't know filter worked on index exclusively. – Umar.H Sep 07 '18 at 22:15
  • 2
    Wen's answer is the correct way to do this. I'd recommend that one, but glad that you found all the answers helpful :) – user3483203 Sep 07 '18 at 22:15