-1

How to find if one or multiple specific sub-strings of an element of a list is duplicate. As in present in the other elements of the list, and then make the list unique by only keeping the first element that has those sub-strings in them and remove every other one (the elements that contain the duplicates).

Example:

SUBSTRINGS=['banana','chocolate']
MYLIST=['1 banana cake','2 banana cake','3 cherry cake','4 chocolate cake','5 chocolate cake','6 banana cake','7 pineapple cake']

The repeating substrings are banana and chocolate in this case.After the processing the list should become:

MYLIST=['1 banana cake','3 cherry cake','4 chocolate cake','7 pineapple cake']
the.salman.a
  • 945
  • 8
  • 29
  • This link have a solution for you. [https://stackoverflow.com/questions/8122079/python-how-to-check-a-string-for-substrings-from-a-list](https://stackoverflow.com/questions/8122079/python-how-to-check-a-string-for-substrings-from-a-list) – mayur kasar Mar 29 '18 at 10:57
  • So what if the element '# cherry cake' appeared more than 1 times in MYLIST, but 'cherry' it is not included in the SUBSTRINGS list? – Glrs Mar 29 '18 at 11:13
  • @TasosGlrs then ignore it, only the ones that are listed in the SUBSTRINGS list should be processed. – douglas780 Mar 29 '18 at 11:16

2 Answers2

0

Here we construct a list new_list by iteration over the original MYLIST. We keep track of which substrings (from SUBSTRINGS) have already been used, using the all_substrings set.

SUBSTRINGS = {'banana', 'chocolate'}
MYLIST = ['1 banana cake', '2 banana cake', '3 cherry cake', '4 chocolate cake', '5 chocolate cake', '6 banana cake', '7 pineapple cake']

new_list = []
all_substrings = set()
for el in MYLIST:
    # All substrings of this element
    substrings = set(el.split())
    # Add this element if it does not have any substrings in common
    # with the all_substrings set.
    if not any(substring in all_substrings for substring in substrings):
        new_list.append(el)
    # Add current substrings which are also present
    # in SUBSTRINGS to all_substrings.
    all_substrings |= (substrings & SUBSTRINGS)
print(new_list)
jmd_dk
  • 12,125
  • 9
  • 63
  • 94
0

Here is a simpler answer than the one jmd_dk posted. Both work fine as far I can see.

SUBSTRINGS = {'banana', 'chocolate'}
MYLIST = ['1 banana cake', '2 banana cake', '3 cherry cake', '4 chocolate cake', '5 chocolate cake', '6 banana cake', '7 pineapple cake']
m = []

for substr in MYLIST:
    if any(el in substr for el in SUBSTRINGS):
        if not any(substr.split()[1] in n for n in m):
            m.append(substr)
    else:
        m.append(substr)

print(m)
Glrs
  • 1,060
  • 15
  • 26
  • Relying on the substrings (here `'banana'` and `'chocolate'`) always being word number 2 (index `1`) is not a good idea. – jmd_dk Mar 29 '18 at 11:49