0

I have a dataset that contains data and headlines of articles. I am trying to apply dictionary search looping through all the headlines.

dict = pd.read_csv("suspectdict.csv")
news_df = pd.read_csv("news.csv")

words in suspectdict.csv contains verbs that describes the actions

charged, murdered, murder, caught, arrested..

and my news.csv consists of criminal articles. So in this case when it loops through the sentences and if for example:

"a local man charged for theft"

since charged is in the dictionary, it will return 1 else 0

Beginner
  • 89
  • 7
  • Not sure why you're looking to use a dictionary, when you're looking for a word or a sentence. Check this out. https://stackoverflow.com/questions/60873474/find-specific-words-on-dataframe – mrpbennett Apr 21 '22 at 13:14
  • @mrpbennett I am just looking to match the word. So if any of the words inside the suspect dictionary exists in the sentences from the dataset then return 1 else 0 – Beginner Apr 21 '22 at 14:07

1 Answers1

0
dictionary = pd.read_csv("suspectdict.csv")
news_df = pd.read_csv("news.csv")

dict_set = set(dictionary)
news_df['suspected'] = newsdf['headline'].apply(lambda line: len(set(line.split()).intersect(dict_set)) > 0)

So you'd want to make the dictionary you are checking against a set as inclusion in the set is O(1). You could then check if any of the words of in the title are in the dictionary by checking for if the size of the set intersection is greater than 0.

jhylands
  • 984
  • 8
  • 16