I have a dataframe with 20 columns, and 3 of those columns (always the same) may contain one or more of these strings ["fraction", "fractional", "1/x", "one fifth"].
I want to add a new column that says whether or not each row is "fractional" (in other words, contains one of those words). This column could have Y or N to indicate this.
I've tried to do it with iterrows
, like so:
list_of_fractional_widgets = []
for index, row in df.iterrows():
fractional_keywords = ["fraction", "fractional", "1/x", "one fifth", "Fraction"]
# use str to remove offending nan values
xx = str(row["HeaderX"])
yy = str(row["HeaderY"])
zz = str(row["HeaderZ"])
widget_data = [xx, yy, zz]
for word in fractional_keywords:
found = [True for x in widget_data if word in x]
if len(found)>0:
list_of_fractional_widgets.append('Y')
break
if len(found) ==0:
list_of_fractional_widgets.append('N')
df['Fractional?'] = list_of_fractional_widgets
however, I'm trying to understand if there is a more pandas
/ numpy
efficient way to do so. Something like:
np.where(df['HeaderX'].str.contains(fractional_keywords?)), True)
as described in this SO question, but using a list and different headers.