Python: subset elements in one list based on substring in another list, retain only one element per substring

Question

I have two lists:

list1 = ['abc-21-6/7', 'abc-56-9/10', 'def-89-7/3', 'hij-2-4/9', 'hij-75-1/7']

list2 = ['abc', 'hij']

I would like to subset list1 such that: 1) only those elements with substrings matching an element in list2 are retained, and 2) for duplicated elements that meet the first requirement, I want to randomly retain only one of the duplicates. For this specific example, I would like to produce a result such as:

['abc-21-6/7', 'hij-75-1/7']

I have worked out code to meet my first requirement:

[ele for ele in list1 for x in list2 if x in ele]

Which, based on my specific example, returns the following:

['abc-21-6/7', 'abc-56-9/10', 'hij-2-4/9', 'hij-75-1/7']

But I am stuck on the second step - how to randomly retain only one element in the case of duplicate substrings. I'm wondering if the random.choice function can somehow be incorporated into this problem? Any advice will be greatly appreciated!

Are the `list2` things always at the beginning of the `list1` things? If so, you can sort both lists and get a `nlogn` solution to this. Otherwise you're quadratic. — Him, Sep 19 '17 at 14:34
Yes they are always at the beginning for this particular problem. Thanks for the input! — nrcombs, Sep 19 '17 at 14:39

score 2 · Accepted Answer · answered Sep 19 '17 at 14:26

2

You can use itertools.groupby:

import itertools
import random
list1 = ['abc-21-6/7', 'abc-56-9/10', 'def-89-7/3', 'hij-2-4/9', 'hij-75-1/7']

list2 = ['abc', 'hij']
new_list1 = [i for i in list1 if any(b in i for b in list2)]
new_data = [list(b) for a, b in itertools.groupby(new_list1, key=lambda x: x.split("-")[0])]
final_data = [random.choice(i) for i in new_data]

Output:

['abc-56-9/10', 'hij-75-1/7']

answered Sep 19 '17 at 14:26

Ajax1234

69,937
8
61
102

@nrcombs Glad to help! – Ajax1234 Sep 19 '17 at 14:35

score 0 · Answer 2 · answered Sep 19 '17 at 14:31

You can use the following function:

def find(list1, findable):
    for element in list1:
        if findable in element:
            return element

Now we can use a list comprehension:

[find(list1, ele) for ele in list2 if find(list1, ele) is not None]

This can be sped up without the list comprehension:

result = []
for ele in list2:
    found = find(list1, ele)
    if found is not None:
        result.append(found)

score 0 · Answer 3 · answered Sep 19 '17 at 14:32

0

You can use a dictionary instead of a list, and then convert the values to a list.

list1 = ['abc-21-6/7', 'abc-56-9/10', 'def-89-7/3', 'hij-2-4/9', 'hij-75-1/7']
list2 = ['abc', 'hij']

final_list = {pref:ele for pref in list2 for ele in list1 if pref in ele}
final_list = list(final_list.values())

this would output:

>>>final_list
['abc-56-9/10', 'hij-75-1/7']

answered Sep 19 '17 at 14:32

José Garcia

136
9

Thanks Jose Garcia! – nrcombs Sep 19 '17 at 14:35
No problem! I think this is somewhat more practical because you don't have to import external modules or define any functions – José Garcia Sep 19 '17 at 14:40

Python: subset elements in one list based on substring in another list, retain only one element per substring

3 Answers3

Linked