0

I want to delete those elements of a list that are contained in other elements of the same list. In my case, these are the nodes of several paths. Example:

mylist=[[1,2,3],[1,2,3,4,5,6],[1,2,3,4,5,6,9,10,11],[1,2,3,4,5,6,7,8,9,10],[11,12,13,14,15],[6,5,4,3,2,1]]

Desired Output:

[[1, 2, 3, 4, 5, 6, 9, 10, 11], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [6,5,4,3,2,1]]

I have the following code which works for me but its performance is just too slow. My list contains 12.000 elements, meaning i*j = ca. 144Million.

deleted=0
testlist_new=[]
dummylist = []
for i in range(len(mylist)):
    x_dummy_list=[]
    dummylist = copy.copy(mylist)
    del dummylist[i]
    for j in range(len(mylist)-1):
        x_dummy_list.append(not Counter(mylist[i]) - Counter(dummylist[j]))
    if True in x_dummy_list:
        deleted=deleted+1
    else:
        testlist_new.append(mylist[i])

Resulting in:

testlist_new=[[1, 2, 3, 4, 5, 6, 9, 10, 11], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15]]

So this is the result. For me, it would be okay that [6,5,4,3,2,1] is not there, because the paths are calculated from one point fan-shaped away. But it would be even greater if the Desired Output could be achieved.

I already tried .set as well as issubset functions which do not work for me as they are not fine enough.

Thank you!

jonas_2021
  • 23
  • 3
  • A good place to start when trying to optimize your code is to first profile it and see where it's spending most of its execution time. This will tell you the part(s) that need improvement. See [How can you profile a Python script?](https://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script). – martineau Mar 09 '21 at 22:19
  • For your case is `1, 2, 3` considered a "subset" of `5, 7, 1, 2, 3`? – ddejohn Mar 09 '21 at 22:45
  • btw you have a typo in the `else` clause, looks like `testlist` should be `mylist`, but either way `testlist` raises a `NameError` – ddejohn Mar 09 '21 at 22:46
  • Can be the sublists sorted? That way you can get rid of the `Counter` and copying the lists... – Andrej Kesely Mar 09 '21 at 23:00
  • @blorgon, thanks, I corrected the typo. No, 1,2,3 cannot be considered as a "subset" of 5,7,1,2,3. – jonas_2021 Mar 09 '21 at 23:12
  • @AndrejKesely: no they cannot be sorted because they represent nodes to be visited one by one – jonas_2021 Mar 09 '21 at 23:14

3 Answers3

0

I don't yet have a solution for your exact desired outcome, but I do have something that is much more efficient than your current solution:

def remove_subsets(x):
    out = []
    for i in range(len(x)):
        if all(not (set(x[i]) < set(y)) for y in x[i+1:] + x[:max(0, i)]):
            out.append(x[i])
    return out

Output from a timeit run:

mine: 0.14 s
yours: 1.33 s
true # just to ensure the two versions produce the same result

Still working on a version which will let you keep your [6, 5, 4, 3, 2, 1].

ddejohn
  • 8,775
  • 3
  • 17
  • 30
  • thanks a lot this one is working perfectly for me! Even though it is not keeping [6, 5, 4, 3, 2, 1] because my paths are just from one point to all directions. – jonas_2021 Mar 10 '21 at 09:33
0

If I understood you correctly, you are trying to eliminate sublists which are contained in other sublists (for example [1, 2, 3] is contained in sublist [1, 2, 3, 4], but not in [5, 1, 2, 3]):

mylist=[[1,2,3],[1,2,3,4,5,6],[1,2,3,4,5,6,9,10,11],[1,2,3,4,5,6,7,8,9,10],[11,12,13,14,15],[6,5,4,3,2,1]]

def starts_with(l1, l2):
    # returns True if list `l2` starts with list `l1`

    for i in range(len(l1)):
        if l1[len(l1)-1-i] == l2[i]:    # take care of reverse lists
            continue
        if l1[i] != l2[i]:
            return False
    else:
        return True

mylist.sort(key=len)

new_list = []
for idx1, l1 in enumerate(mylist):
    for l2 in mylist[idx1+1:]:
        if len(l1) > len(l2):
            break
        if starts_with(l1, l2):
            break
    else:
        new_list.append(l1)

print(new_list)

Prints:

[[11, 12, 13, 14, 15], [1, 2, 3, 4, 5, 6, 9, 10, 11], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

I converted each list to a string to avoid hashing issues (I will call them slists), then check if that slist is a substring of any other slists from the original set. I then made the first slist a key in a dict, and the set of True's and False's that are returned from the substring check as the value. I then check if there are any True's (meaning that that particular slist was a subset of some other list) - if there are no True's, then it is a "unique" slist, and it is preserved and translated back into a regular list of ints.

I've attached the code here:

from typing import List

def trim_sublists(mylist: List[List[int]])-> List[List[int]]:
    mylist = set([":".join(map(str, sub)) for sub in mylist])
    mydict = {sub: set(map(lambda x: sub in x, mylist - {sub})) for sub in mylist}
    return [list(map(int, k.split(":"))) for k,v in mydict.items() if not any(v)]


# You need to run trim_sublists(mylist)

If we let n be the length of your original list mylist and m be the length of the longest sublist in mylist, we see that the first list comprehension should be O(n), the dictionary comprehension should be O(n2), and the final list comprehension should be O(nm).

Note that order is preserved here, meaning that my function returns [1,2,3,4,5,6] and [6,5,4,3,2,1], exactly as you requested above. The fact that I converted each list into a string means that lists with the same elements in different orders will never be considered the same.

v0rtex20k
  • 1,041
  • 10
  • 20