2

I have two lists:

list1 = ['abc-21-6/7', 'abc-56-9/10', 'def-89-7/3', 'hij-2-4/9', 'hij-75-1/7']

list2 = ['abc', 'hij']

I would like to subset list1 such that: 1) only those elements with substrings matching an element in list2 are retained, and 2) for duplicated elements that meet the first requirement, I want to randomly retain only one of the duplicates. For this specific example, I would like to produce a result such as:

['abc-21-6/7', 'hij-75-1/7']

I have worked out code to meet my first requirement:

[ele for ele in list1 for x in list2 if x in ele]

Which, based on my specific example, returns the following:

['abc-21-6/7', 'abc-56-9/10', 'hij-2-4/9', 'hij-75-1/7']

But I am stuck on the second step - how to randomly retain only one element in the case of duplicate substrings. I'm wondering if the random.choice function can somehow be incorporated into this problem? Any advice will be greatly appreciated!

nrcombs
  • 503
  • 3
  • 17
  • Are the `list2` things always at the beginning of the `list1` things? If so, you can sort both lists and get a `nlogn` solution to this. Otherwise you're quadratic. – Him Sep 19 '17 at 14:34
  • Yes they are always at the beginning for this particular problem. Thanks for the input! – nrcombs Sep 19 '17 at 14:39

3 Answers3

2

You can use itertools.groupby:

import itertools
import random
list1 = ['abc-21-6/7', 'abc-56-9/10', 'def-89-7/3', 'hij-2-4/9', 'hij-75-1/7']

list2 = ['abc', 'hij']
new_list1 = [i for i in list1 if any(b in i for b in list2)]
new_data = [list(b) for a, b in itertools.groupby(new_list1, key=lambda x: x.split("-")[0])]
final_data = [random.choice(i) for i in new_data]

Output:

['abc-56-9/10', 'hij-75-1/7']
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
0

You can use the following function:

def find(list1, findable):
    for element in list1:
        if findable in element:
            return element

Now we can use a list comprehension:

[find(list1, ele) for ele in list2 if find(list1, ele) is not None]

This can be sped up without the list comprehension:

result = []
for ele in list2:
    found = find(list1, ele)
    if found is not None:
        result.append(found)
Him
  • 5,257
  • 3
  • 26
  • 83
0

You can use a dictionary instead of a list, and then convert the values to a list.

list1 = ['abc-21-6/7', 'abc-56-9/10', 'def-89-7/3', 'hij-2-4/9', 'hij-75-1/7']
list2 = ['abc', 'hij']

final_list = {pref:ele for pref in list2 for ele in list1 if pref in ele}
final_list = list(final_list.values())

this would output:

>>>final_list
['abc-56-9/10', 'hij-75-1/7']
José Garcia
  • 136
  • 9