Hi all. I am working on a dataframe (picture above) with over 18000 observations. What I'd like to do is to get the text in the column 'review' one after the other and then do a word count later on it. At the moment I have been trying to iterate over it but I have been getting error like "TypeError: 'float' object is not iterable"
. Here is the code I used:
def tokenize(text):
for row in text:
for i in row:
if i is not None:
words = i.lower().split()
return words
else:
return None
data['review_two'] = data['review'].apply(tokenize)
Now my question is: how do I iterate effectively and efficiently over the column 'review' so that I can now preprocess each row one after the other before I now perform word count on it?