4

aI want to make a new list that matches from a list of sentences against a list of keywords.

list = ['This sentence contains disclosure.', 'This sentence contains none declared.', 'This sentence contains competing interest.', 'This sentence contains authors declare.']
keywords = ['disclosure ', 'none declared', 'interest']

The new list should print should come out like this

matched_list = ['This sentence contains disclosure.', 'This sentence contains none declared.']

I have tried using

r = re.compile('.*disclosure')
newlist = list(filter(r.match, list))

However I have a very large list of keywords and it will be impossible to type it all out in the r = re.compile('.*keywords'). Is there any other way to to match a list of sentences with a list of keywords.

cs95
  • 379,657
  • 97
  • 704
  • 746
Kaung Myat
  • 107
  • 7
  • Try `matched_list = [l for l in lst if any(k in l for k in keywords)]` if regex based matching is not needed. – cs95 Nov 11 '18 at 09:16
  • @coldspeed thank you so much the matching worked. Is it possible to kind of explain the syntax you used for this code especially with 1 for 1. – Kaung Myat Nov 11 '18 at 09:26
  • See e.g. https://stackoverflow.com/q/30670310/3001761 – jonrsharpe Nov 11 '18 at 09:34

1 Answers1

2

You will have to check each string against the keyword list. Use a list comprehension, assuming simple string matching is enough (without the need for regex).

matched_list = [
    string for string in lst if any(
        keyword in string for keyword in keywords)]

Which is really just a concise way of saying:

matched_list = []
for string in lst:
    if any(keyword in string for keyword in keywords):
        matched_list.append(string)

any will short circuit, returning True for the first keyword that matches (or else returns False if no match is found).


If you want to use regex, you can precompile your pattern and then call pattern.search inside a loop, as usual:

import re
p = re.compile('|'.join(map(re.escape, keywords)))
matched_list = [string for string in lst if p.search(string)]
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Just a additional follow up, is there a way to re modify the code so that instead of getting the list of matched sentence can i get it to tabulate which word got matched? @coldspeed – Kaung Myat Nov 11 '18 at 12:48
  • @KaungMyat as in return the index of the match? What if there is no match? – cs95 Nov 11 '18 at 20:29
  • Yes, as in return the index of the match? Just to clarify instead of extracting the entire string of the list, i want to just extract out the matched words and possibly count them. The code you showed me worked perfectly for the part but i am currently working on something else and i tried to modify the code but it seem to not work. @coldspeed – Kaung Myat Nov 11 '18 at 22:27
  • @KaungMyat Do you want something like `from collections import Counter; counts = Counter(k for s in lst for k in keywords if k in s)` – cs95 Nov 11 '18 at 22:31