6

I encountered the error

'>' not supported between instances of 'str' and 'int'

while trying to print the below lines in Pandas dataframe

print (survey_df_clean.shape)
print (survey_df_clean[survey_df_clean['text']>30].shape)

Should I try to convert them to int and how would that work in this statement?

MattR
  • 4,887
  • 9
  • 40
  • 67
  • i'm assuming that this is a `pandas` dataframe? – MattR Sep 14 '17 at 19:50
  • `survey_df_clean['text']>30` probably the left part is not integer datatype – Jean-François Fabre Sep 14 '17 at 19:50
  • 2
    @ThomasWeller I don’t think this is a duplicate of that at all. I agree that the “thanks” lines are unnecessary, but flagging the question as a duplicate is not the right way to bring that to the asker’s attention. – Daniel H Sep 14 '17 at 19:54
  • You probably do need to convert that column to an integral data type. THe [`to_numeric` method](http://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.to_numeric.html) is probably what you want, but if that doesn’t work we need more detail. How do you load the data? Can you give a few sample rows of the dataframe? – Daniel H Sep 14 '17 at 19:56
  • @ Jean-François Fabre Should I try to convert them to int ?? –  Sep 14 '17 at 19:57
  • @DanielH: it's not possible to flag as a duplicate on Meta. Duplicates must be on the same site. – Thomas Weller Sep 14 '17 at 20:01
  • 1
    I'm just here to note that I had this same error, google led me here, and the ultimate root of it was that I had duplicate column labels in my dataframe, and was trying to divide that column (all numeric) by another column. Because there were two such columns, the lot of them was getting put into the division as an object, rendered to str, thus resulting in the error above when divided. Was very frustrating and I hope this saves someone the pain. – Steve Estes Feb 22 '21 at 11:55

5 Answers5

6

First make sure that all value of survey_df_clean['text'] is the same, if you want to convert as numeric, do this :

survey_df_clean['text'] = pd.to_numeric(survey_df_clean['text'])

Then do this

survey_df_clean.loc[survey_df_clean['text']>30].shape
Fariliana Eri
  • 181
  • 2
  • 5
3

This message suggests, that you try to compare a string object (str) with an integer (int). The expression

survey_df_clean['text']

will probably return a string. Therefore, you cannot directly compare it with the number 30. If you want to compare the length of the entry, you can use the pandas.Series.str.len() operation as you can see here.

If this field should actuallty contain an integer, you can use this method (pandas.to_numeric) to cast it from str to int.

zimmerrol
  • 4,872
  • 3
  • 22
  • 41
1

survey_df_clean['text'] might have NAN or str values in it some where. to find out :

survey_df_clean['text'].isnull().sum()

if they are,first take care of them then apply

print (survey_df_clean[survey_df_clean['text']>30].shape)
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
Athar Noraiz
  • 451
  • 1
  • 4
  • 6
1

I had the same error message when trying to use that conditional. What intrigued me was that the same command had run correctly on another notebook.

The difference was in how I read the csv file. This was the troublesome one:

df=pd.read_csv('data.csv')

And when I put the decimal argument it worked:

df=pd.read_csv('data.csv', decimal=',')

Obviously, it'll depend on how your data is organized. ;)

Paulo U
  • 11
  • 1
  • This does not really answer the question as it is asked. The question never mentions anything about reading from a csv. This also wouldn't fix the data type mismatch. – Jacobr365 Apr 15 '20 at 16:53
0

This is because values in 'text' column are of type str and you are comparing str with int. You can do a quick check for getting type of 'text' column.

print(type(survey_df_clean['text'][:1][0]))

For comparing you can do as following

survey_df_clean[survey_df_clean['text'].astype(int)>30]