0

So say I have, list1 = ['the dog', 'the cat', 'cat dog', 'the dog ran home']

and sub_string = 'the dog'

how can I return list2 = ['the dog', 'the cat', 'cat dog']

i.e, return a list with the last occurrence of the substring removed?

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
cjg123
  • 473
  • 7
  • 23

5 Answers5

2

No built-in will help much you here, since scanning for a substring in a list isn't a supported feature, and doing so in reverse order is doubly hard. List comprehensions won't do much good either, since making them stateful enough to recognize when you've found your needle would involve adding side-effects to the list comprehension, which makes it cryptic and violates the purpose of functional programming tools. So you're stuck doing the looping yourself:

list2 = []
list1iter = reversed(list1)  # Make a reverse iterator over list1
for item in list1iter:
    if sub_string in item:   # Found item to remove, don't append it, we're done
        break
    list2.append(item)       # Haven't found it yet, keep item
list2.extend(list1iter)      # Pull all items after removed item
list2.reverse()              # Put result back in forward order

Try it online!

An alternative approach would be to scan by index, allowing you to del it; this might be a better solution if you want to modify list1 in place, rather than making a new list:

for i, item in enumerate(reversed(list1), 1):
    if sub_string in item:
        del list1[-i]
        break

Try it online!

That solution is adaptable to making a new copy by simply changing all references to list1 to list2, and adding list2 = list1[:] before the loop.

In both cases, you can detect if an item was found at all by putting an else: block on the for; if the else block triggers, you didn't break, because sub_string wasn't found anywhere.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
2

the problem statement is to: remove the element with the substring as the query

so, as I deduce it has two steps.

  1. Find the element with the substring.
  2. Remove the element.

for pattern matching, we can use re module (we can use in as well as mentioned in ShadowRanger's answers)

import re

pattern = re.compile('the dog') # target pattern 
my_list = ['the dog', 'the cat', 'cat dog', 'the dog ran home'] # our list
my_list = enumerate(my_list) # to get indexes corresponding to elemnts i.e. [(0, 'the dog'), (1, 'the cat'), (2, 'cat dog'), (3, 'the dog ran home')]
elems = list(filter(lambda x: pattern.search(x[1]), my_list) # match the elements in the second place and filter them out, remember filter in python 3.x returns an iterator
print(elems) # [(0, 'the dog'), (3, 'the dog ran home')]
del my_list[elems[-1][0]] # get the last element and take the index of it and delete it.

EDIT

As ShadowRunner suggested, we can optimize the code with the use of list comprehension with if statement instead of filter function.

elems = [i for i, x in enumerate(my_list) if pattern.search(x)]
P.hunter
  • 1,345
  • 2
  • 21
  • 45
  • 1
    Not sure why you'd bother with `re` here; the OP's use case doesn't require pattern matching, just substring checking. Also note, by using `match`, this code doesn't meet the OP's stated requirements; `match` includes an implicit beginning-of-string anchor, so this will only match if the element *begins* with the substring, not when it *contains* the substring. You'd want `search` for an unanchored containment check. – ShadowRanger Sep 26 '19 at 15:50
  • @ShadowRanger thanks for pointing that out, I edited my answer and yeah, I believe re was a bit overkill for this, my bad, that's why I upvoted your answer, but I believe I should leave it here if the user can take anything away with my answer, that would be great. – P.hunter Sep 26 '19 at 16:49
  • 1
    Yeah, it wasn't wrong enough to warrant a downvote or anything. I would suggest replacing the `filter` line and the line preceding it with just `elems = [i for i, x in enumerate(my_list) if pattern.search(x)]` (or without `re`, `if 'the dog' in x`), as `filter` is basically always uglier/slower when you need a `lambda` to use it. It's fine when an existing function does *exactly* what you want (if said function is a built-in implemented in C, it usually faster), but if one doesn't exist, an equivalent listcomp or genexpr without the `lambda` is always faster and clearer in most cases. – ShadowRanger Sep 26 '19 at 18:50
  • Yeah, I understand that methods like filter are slower when used with lambda comparatively to list comprehension. I'll edit my answer with your suggestion. :) – P.hunter Sep 27 '19 at 07:35
1

You could do it in two steps:

  1. Find the index of last occurrence.
  2. Return all the elements that not match that index.

Example:

needle = 'the dog'
haystack = ['the dog', 'the cat', 'cat dog', 'the dog ran home']

last = max(loc for loc, val in enumerate(haystack) if needle in val)
result = [e for i, e in enumerate(haystack) if i != last]

print(result)

Output

['the dog', 'the cat', 'cat dog']

For more details on finding the index of the last occurrence see this.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • 1
    A note: You could avoid traversing the whole input by replacing the `last` computation with something like `last = len(haystack) - next(loc for loc, val in enumerate(reversed(haystack), 1) if needle in val)`. By running in reverse order, and using `next` on the generator expression, you'll short-circuit, and only have to check values until you find a match, rather than check every value. I avoided compressing the second part of [my answer](https://stackoverflow.com/a/58120251/364696) down to that as it gets a little dense/magical, but it's the same basic logic. – ShadowRanger Sep 26 '19 at 16:52
  • Apparently I wasn't the first one to think of that; [a comment](https://stackoverflow.com/questions/6890170/how-to-find-the-last-occurrence-of-an-item-in-a-python-list#comment72139131_6890255) on one of the answers in your linked question suggests the same thing. – ShadowRanger Sep 26 '19 at 16:56
1
list1 = ['the dog', 'the cat','the dog me', 'cat dog']
sub_string = 'the dog'

for i in list1[::-1]:
    print(i)
    if sub_string in i:
        list1.remove(i)
        break

output ['the dog', 'the cat', 'the dog me', 'cat dog']

misha
  • 122
  • 6
1

One solution is to traverse the input in reverse order and find the index in the reversed list. After that, use the index to slice the input list1.

idx = next(i for i, s in enumerate(reversed(list1), 1) if sub_string in s)
list2 = list1[:-idx]  # If in-place updates are intended, use `del list1[-idx:]` instead
GZ0
  • 4,055
  • 1
  • 10
  • 21