1

This is a quite easy task, however, I am stuck here. I have a dataframe and there is a column with type string, so characters in it:

Category
AB00
CD01
EF02
GH03
RF04

Now I want to treat these values as numeric and filter on and create a subset dataframe. However, I do not want to change the dataframe in any way. I tried:

df_subset=df[df['Category'].str[2:4]<=3]

of course this does not work, as the first part is a string and cannot be evaluated as numeric and compared to 69.

I tried

df_subset=df[int(df['Category'].str[2:4])<=3]

but I am not sure about this, I think it is wrong or not the way it should be done.

PSt
  • 97
  • 11
  • `df['Category'].str[2:4]<='69'`? Are you comparing to `69` or to `3`? – BigBen Jan 11 '23 at 15:47
  • maybe your problem is solved here: https://stackoverflow.com/questions/11350770/filter-pandas-dataframe-by-substring-criteria – Pablo Jan 11 '23 at 15:47

2 Answers2

1

Add type conversion to your expression:

df[df['Category'].str[2:].astype(int) <= 3]

  Category
0     AB00
1     CD01
2     EF02
3     GH03
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
1

As you have leading zeros, you can directly use string comparison:

df_subset = df.loc[df['Category'].str[2:4] <= '03']

Output:

  Category
0     AB00
1     CD01
2     EF02
3     GH03
mozway
  • 194,879
  • 13
  • 39
  • 75