0

I have a dataset of user comments and ratings. I am preprocessing this dataset but I get an error as below. How can I fix it?

    def DataCleaning(metin):
     numbers = "0123456789"
     lower_case=metin.lower()
     punct_removed = [char for char in lower_case if char not in string.punctuation]
     punct_removed=[char for char in punct_removed if char not in numbers]
     punct_removed_join=''.join(punct_removed)
     punct_removed_join_clean = [word for word in punct_removed_join.split() if word not in 
     stopwords.words('english')]
     return punct_removed_join_clean


otel_verileri["reviews.text"] = otel_verileri["reviews.text"].apply(DataCleaning)
otel_verileri["reviews.text"].tolist()


OUTPUT:
AttributeError                            Traceback (most recent call last)
<ipython-input-56-a80b269d8bbe> in <module>()
      1 
----> 2 otel_verileri["reviews.text"] = otel_verileri["reviews.text"].apply(DataCleaning)
      3 otel_verileri["reviews.text"].tolist()

1 frames
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-48-748ef67e84ac> in DataCleaning(metin)
      1 def DataCleaning(metin):
      2  numbers = "0123456789"
----> 3  lower_case=metin.lower()
      4  punct_removed = [char for char in lower_case if char not in string.punctuation]
      5  punct_removed=[char for char in punct_removed if char not in numbers]

AttributeError: 'float' object has no attribute 'lower'
halfer
  • 19,824
  • 17
  • 99
  • 186
Seda Yılmaz
  • 31
  • 1
  • 8
  • Please read [Under what circumstances may I add “urgent” or other similar phrases to my question, in order to obtain faster answers?](//meta.stackoverflow.com/q/326569) - the summary is that this is not an ideal way to address volunteers, and is probably counterproductive to obtaining answers. Please refrain from adding this to your questions. – halfer Dec 05 '20 at 01:09
  • `assert isinstance(metin, str), repr(metin)` put that above the line where your error happens. run it. see what value violates your expectation. for some reason your `reviews.text` column doesn't contain just text. is there some auto conversion at play here? – Christoph Rackwitz Dec 05 '20 at 01:25

1 Answers1

0

I am guessing that you use the pandas library. I don't know if you are reading an excel file but I'll assume it.

Pandas seems to like inferring types on its own. you can suppress that and demand a specific column be only str using this:

otel_verileri = pd.read_excel(file_name, converters={'reviews.text' : str})

(source: another answer on SO)

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36