I want to replace the substrings of words in the below column with a defined category.
My dataframe contains a column 'amenities' and there are 14K rows in the dataset. For each cell of 'amenities' there is long list of substrings. Please see the picture attached.
Sample Data:
Listing ID | Amenities |
---|---|
01 | "TV", "Machine", "Refrigerator", "hair dryer", "micro", "parking lot", "shower", "body soap" |
02 | "HDTV", "washing machine", "fridge", "microwave", "parking", "soap" |
03 | "TV with netflix", "free parking", "soap" |
04 | "TV with Amazon Prime", "parking area", "Refrigerator", "dryer for hair", "microwave" |
As you can see that each amenity is mentioned in different styles in different cells. For Example, TV is mentioned as TV, HDTV, TV with netflix, 22 inch TV.
Basically, i want to replace the related amenities and put them in one category, so for anything that contains TV, will be replaced with word TV, but as there are other amenities written as substring in "" in the same cell, that is making it complicated for me.
Can anyone please help me.