Having a list of fruits, I want to check if and which if them existing in the data frame (regardless which columns), and indicate them.
import pandas as pd
Fruits = ["Avocado", "Blackberry", "Black Sapote", "Fingered Citron", "Crab Apples", "Custard Apple", "Chico Fruit", "Coconut", "Damson", "Elderberry", "Goji Berry", "Grape", "Guava", "Huckleberry"]
data = {'ID': ["488", "14805", "23591", "470995", "56251", "85964", "5268", "322624", "342225", "380689", "480562", "5623"],
'Content' : ["Kalo Beruin", "this is Blackberry", "Khara Beruin", "guava and coconut", "Lapha", "Loha Sura", "Matichak", "Miniket Rice", "Mou Beruin", "Moulata", "oh Goji Berry", "purple Grape"],
'Content_1' : ["Jook-sing noodles", "grape", "Lai fun", "Damson", "Liangpi", "Custard Apple and Crab apples", "Misua", "nana Coconut Berry", "Damson", "Paomo", "Ramen", "Rice vermicelli"]}
df = pd.DataFrame(data)
df = df[['ID', 'Content', 'Content_1']]
s = pd.Series(data['Content'])
s_1 = pd.Series(data['Content_1'])
df["found_content"] = s[s.str.contains('|'.join(Fruits))]
df["found_content_1"] = s_1[s_1.str.contains('|'.join(Fruits))]
writer = pd.ExcelWriter('C:\\TEM\\22522.xlsx')
df.to_excel(writer,'Sheet1', index = False)
writer.save()
the problems of the codes are:
- Instead of indicating the Fruit, it shows the whole content. for example the row of 14805, it shall be "Blackberry" only instead of the whole original content.
- It's case sensitive so missing some findings, like the row of 14805.
- I want to use ";" to separate the finding, like the row of 85964.
How can I achieve it? Thank you.
Here's the screenshot of the current output and wanted output.