Filter on a pandas string column as numeric without creating a new column

Question

This is a quite easy task, however, I am stuck here. I have a dataframe and there is a column with type string, so characters in it:

Category
AB00
CD01
EF02
GH03
RF04

Now I want to treat these values as numeric and filter on and create a subset dataframe. However, I do not want to change the dataframe in any way. I tried:

df_subset=df[df['Category'].str[2:4]<=3]

of course this does not work, as the first part is a string and cannot be evaluated as numeric and compared to 69.

I tried

df_subset=df[int(df['Category'].str[2:4])<=3]

but I am not sure about this, I think it is wrong or not the way it should be done.

`df['Category'].str[2:4]<='69'`? Are you comparing to `69` or to `3`? — BigBen, Jan 11 '23 at 15:47
maybe your problem is solved here: https://stackoverflow.com/questions/11350770/filter-pandas-dataframe-by-substring-criteria — Pablo, Jan 11 '23 at 15:47

score 1 · Accepted Answer · answered Jan 11 '23 at 15:46

1

Add type conversion to your expression:

df[df['Category'].str[2:].astype(int) <= 3]

  Category
0     AB00
1     CD01
2     EF02
3     GH03

answered Jan 11 '23 at 15:46

RomanPerekhrest

88,541
4
65
105

score 1 · Answer 2 · answered Jan 11 '23 at 15:51

1

As you have leading zeros, you can directly use string comparison:

df_subset = df.loc[df['Category'].str[2:4] <= '03']

Output:

  Category
0     AB00
1     CD01
2     EF02
3     GH03

answered Jan 11 '23 at 15:51

mozway

194,879
13
39
75

Filter on a pandas string column as numeric without creating a new column

2 Answers2