Remove subset of an element from a list if there are more than one

Question

If I have a list like:

[u'test_1', u'test_2', u'test_3', u'bananas_4', u'cake_5', u'bananas_6']

What would be the best way to just get the following without knowing anything else in advance?

[u'test_1', u'bananas_4', u'cake_5']

So how I see it, would be something like loop over the list, store the test and bananas somehow, and if on another iteration, see another of the same start of the string, remove that from the list.

Does anyone know the best way of achieving this?

Can you please clarify your question? Are you looking for only the first string in the list with a common substring? Or maybe the ordering them by the number at the end? Are you only looking for the first words startign with 'test_', 'bananas_', or 'cake_'? — Ben, Nov 27 '17 at 16:20
@Ben I'm looking for `[u'test_1', u'bananas_4', u'cake_5']` (although the number doesn't really matter) if that makes sense? — Rekovni, Nov 27 '17 at 16:21
@Rekovni- you're trying to make a smaller list from your big list by some condition. Your example isn't enough for me to guess the condition you're looking for — Ben, Nov 27 '17 at 16:24
@Ben the condition is everything in front of the underscore, so removing all repeated `test` and `bananas` from the list. Everything after the underscore doesn't matter really. — Rekovni, Nov 27 '17 at 16:27

score 2 · Answer 1 · answered Nov 27 '17 at 16:23

My main idea uses the dictionary functionality that items are not overridden by default.

I used OrderedDict to keep the order of insertion of items.

lst = [u'test_1', u'test_2', u'test_3', u'bananas_4', u'cake_5', u'bananas_6']
d = OrderedDict()
for item in lst:
    key, val = item.split('_')
    d.setdefault(key, val) # will not override if item was there before

new_list = [key + '_' + val for key,val in d.items()]
print new_list

Output is

[u'test_1', u'bananas_4', u'cake_5']

score 1 · Accepted Answer · answered Nov 27 '17 at 16:32

Simply keep a set of your prefixes and only add items to your filtered list if they're not in the prefix list:

start = [u'test_1', u'test_2', u'test_3', u'bananas_4', u'cake_5', u'bananas_6']

seen = set()
end = []

for item in start:
    prefix = item.partition('_')[0]
    if prefix not in seen:
        end.append(item)
        seen.add(prefix)

print(end)  # ['test_1', 'bananas_4', 'cake_5']

score 0 · Answer 3 · answered Nov 27 '17 at 16:25

I would split it into two sections. The first is to split the string in the list by "_" then you would have the raw information [test,test,test,banana,cake,banana] and another with the numbers [1,2,3,4,5,6]

You could then find the uniques of the string list with the following solution: Get unique values from a list in python. Finally append the numbers back on.

Remove subset of an element from a list if there are more than one

3 Answers3