I am trying to identify if a column Blaze[Info]
contains within the text a string from a list (and create a new Boolean column with that information).
The DataFrame looks like:
Word Info
0 Aam Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham...
1 aard-vark Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.)
2 aard-wolf Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.)
When I state the term directly I get the answer I want:
Blaze['Noun'] = np.where((Blaze['Info'].str.contains('n.')),True,False)
Blaze['Verb'] = np.where((Blaze['Info'].str.contains('v.')),True,False)
Word Info Noun Verb
0 Aam Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham... True False
1 aard-vark Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.) True False
2 aard-wolf Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.) True False
but this is not scalable as I have 100+ features to search for.
When I iterate through the list abbreviation
:
abbreviation=['n'., 'v.']
col_name=['Noun','Verb']
for i in range(len(abbreviation)):
Blaze[col_name[i]] = np.where((Blaze['Info'].str.contains(abbreviation[i])), True, False)
I am returned DataFrame full of 'FALSE' entries:
Word Info Noun Verb
0 Aam Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham... False False
1 aard-vark Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.) False False
2 aard-wolf Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.) False False
I can see several answers for doing something similar but grouping the answer in a single row: Check if each row in a pandas series contains a string from a list using apply?
Scalable solution for str.contains with list of strings in pandas
but I don't think these solve the above.
Is anyone able to explain how I am going wrong?