0

Hello evreyone i have a dataset and i want to apply a fuction that lowercase the ingredients, remove the pronctuation and the stopwords to make after some plots etc. The ingredients is in a list in the dataset and when i tried to apply a function i get error.

Also can anyone help me how to achieve to have all this things in a function and continue to have the same form with the processed data in my dataset?

train_dataset

id  cuisine ingredients
0   10259   greek   [romaine lettuce, black olives, grape tomatoes...
1   25693   southern_us [plain flour, ground pepper, salt, tomatoes, g...
2   20130   filipino    [eggs, pepper, salt, mayonaise, cooking oil, g...
3   22213   indian  [water, vegetable oil, wheat, salt]
4   13162   indian  [black pepper, shallots, cornflour, cayenne pe...
... ... ... ...
39769   29109   irish   [light brown sugar, granulated sugar, butter, ...
39770   11462   italian [KRAFT Zesty Italian Dressing, purple onion, b...
39771   2238    irish   [eggs, citrus fruit, raisins, sourdough starte...
39772   41882   chinese [boneless chicken skinless thigh, minced garli...
39773   2362    mexican [green chile, jalapeno chilies, onions, ground..

i wrote this function

def preprocess(text):
    return str(text.lower())
   
train_dataset["lowercase"]= train_dataset["ingredients"].apply(preprocess)

and i get this error

p

Lefteris Kyprianou
  • 219
  • 1
  • 3
  • 14
  • `str(text).lower()` guarantees that you have a string _before_ you call `lower()`. Not that that's likely to be what you _actually want_, but it fixes your immediate bug. In terms of what you _actually want_, make sure you're passing the individual items in the list through `lower()`, not the list itself. – Charles Duffy Apr 16 '21 at 23:04
  • How would you solve the problem if you didn't have a DataFrame, and just had a single, plain list of strings? – Karl Knechtel Apr 16 '21 at 23:06
  • The second linked duplicate has a pandas-specific answer that's likely to be of particular interest. – Charles Duffy Apr 16 '21 at 23:07
  • `re.sub('\W',r' ',str(text)).lower()` i get what i want with this but now i want to integrate in the same fucntion something that remove the stopwords. – Lefteris Kyprianou Apr 17 '21 at 07:55

0 Answers0