0

I have this list of phrases and dictionary below:

phrases = ['leed','i am good', 'going to the market', 'eating cookies']

dictionary = {'http://www.firsturl.com': 'i am going to the market and tomorrow will be eating cookies',
             'http://www.secondurl.com': 'tomorrow is my birthday and i shall be', 
             'http://www.thirdurl.com': 'i am good and will go to sleep bleeding'}

The below code gives the desired result:

df = pd.DataFrame({'urls': list(dictionary.keys()), 'strings': list(dictionary.values())})
pattern = '|'.join(phrases)

s = df.pop('strings').str.findall(pattern)
df = df.assign(phrasecount=s.str.len(), phrase=s.map(', '.join))
df = df.drop_duplicates(subset='phrasecount') if df['phrasecount'].eq(0).all() else df[df['phrasecount'].ne(0)]

However, 'leed' should not be a match in the 3rd url. it's matching due to bleeding. I want only absolute match.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • hang on i got answer for yiu. oh you have to join all together to examine oderwise cant help –  Jun 07 '20 at 21:36
  • the dupr generalties wont help a bit, show the resultant strings and get an answer in a minute –  Jun 07 '20 at 21:40

0 Answers0