In Python, how can I remove items from a list based on a list of strings?

Question

I have a list of strings that I want to remove items from. I have a list of keywords that I am searching for in these items. I cannot seem to get the output I am looking for. I am not sure if regular expressions are the right way to handle this.
I want the output to be ['/item/page/cat-dog', '/item/page/animal-planet']

valid = ['/item/page/cat-dog', '/item/page/animal-planet', '/item/page/variable']
keywords = ['cat','planet']


for item in valid: 
    #a = re.findall()
    #

Possible duplicate of [How to make a flat list out of list of lists](https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-list-of-lists) — thewaywewere, Apr 26 '19 at 15:12

Alexis Pister · Answer 1 · 2019-04-26T15:10:48.970

Python comes with the handy keywords in and not in to test if an object is or is not in a list.

for your problem, you can simply do :

new_list = []
for item in valid: 
    if os.path.basename(item) not in keywords:
        new_list.append(item)

os.path.basename gives the name of the files without the arborescence. new_list will then contain all the elements of valid in which the filenames were not in keyword.

morpheuz · Answer 2 · 2019-04-26T15:13:58.310

0

As far as I can understand, and based on @dan-d's comment what you need is

[s for s in valid if not any(q in s for q in keywords)]

edited Apr 26 '19 at 15:13

answered Apr 26 '19 at 15:07

morpheuz

91
1
10

score 0 · Answer 3 · answered Apr 27 '19 at 07:50

As suggested in the comments and other answers, the in operator may be used to check if a string is a substring of another string. For the example data in the question, using in is the simplest and fastest way to get the desired result.

If the requirement is to match '/item/page/cat-dog' but not '/item/page/catapult' - that is only match the word 'cat', not just the sequence c-a-t, then a regular expression may be used to do the matching.

The pattern to match a single word is '\bfoo\b' where '\b' marks a word boundary.

The alternation operator '|' is used to match one pattern or another, for example 'foo|bar' matches 'foo' or 'bar'.

Construct a pattern that matches the words in keywords; call re.escape on each keyword in case they contain characters that the regex engine might interpret as metacharacters.

>>> pattern = r'|'.join(r'\b{}\b'.format(re.escape(keyword)) for keyword in keywords)
>>> pattern
'\\bcat\\b|\\bplanet\\b'

Compile the pattern into a regular expression object.

>>> rx = re.compile(pattern)

Find the matches: using filter is elegant:

>>> matches = list(filter(rx.search, valid))
>>> matches
['/item/page/cat-dog', '/item/page/animal-planet']

But it's common to use a list comprehension:

>>> matches = [word for word in valid if rx.search(word)]
>>> matches
['/item/page/cat-dog', '/item/page/animal-planet']

In Python, how can I remove items from a list based on a list of strings?

3 Answers3