-1

I have a list of lists, where the lists are always ordered in the same way, and within each list several of the elements are duplicates. I would therefore like to remove duplicates from the list, but it's important that I retain the structure of each list i.e. if elements indices 0, 1 and 2 are all duplicates for a given list, two of these would be removed from the list, but then the same positions elements would also have to be removed from all the other lists too to retain the ordered structure.

Crucially however, it may not be the case that elements with indices 0, 1 and 2 are duplicates in the other lists, and therefore I would only want to do this if I was sure that across the lists, elements indexed by 0, 1 and 2 were always duplicated.

As an example, say I had this list of lists

L = [ [1,1,1,3,3,2,4,6,6], 
[5,5,5,4,5,6,5,7,7], 
[9,9,9,2,2,7,8,10,10] ]

After applying my method I would like to be left with

L_new = [ [1,3,3,2,4,6], 
[5,4,5,6,5,7], 
[9,2,2,7,8,10] ]

where you see that elements index 1 and 2 and element 8 have all been constantly removed because they are consistently duplicated across all lists, whereas elements index 3 and 4 have not because they are not always duplicated.

My thinking so far (though I believe this is probably not the best approach and why I asked for help)

def check_duplicates_in_same_position(arr_list):
    check_list = []
    for arr in arr_list:
        duplicate_positions_list = []
        positions = {}
        for i in range(len(arr)):
            item = arr[i]
            if item in positions:
                positions[item].append(i)
            else:
                positions[item] = [i]
        duplicate_positions = {k: v for k, v in positions.items() if len(v) > 1}
        for _, item in duplicate_positions.items():
            duplicate_positions_list.append(item)
        check_list.append(duplicate_positions_list)
    
    return check_list

This returns a list of lists of lists, where each element is a list that contains a bunch of lists whose elements are the indices of the duplicates for that list as so

[[[0, 1, 2], [3, 4], [7, 8]],
 [[0, 1, 2, 4, 6], [7, 8]],
 [[0, 1, 2], [3, 4], [7, 8]]]

I then thought to somehow compare these lists and for example remove elements index 1 and 2 and index 8, because these are common matches for each.

Pronitron
  • 177
  • 8
  • 1
    so what have you tried so far? share your code and what should be the final output – sahasrara62 Jan 17 '23 at 14:59
  • Did you break this task down into smaller pieces? What part of this task are you struggling with? Are you able to find which items are duplicates in a single list? Are you able to correlate these across all lists? Where are you stuck? – Pranav Hosangadi Jan 17 '23 at 15:07
  • "elements 1 and 2 and *element 8* have all been constantly removed" The element 8 appears the same number of times in `L` and `L_new`. Do you mean the 8**th** element, i.e. the one at index 7? Is it relevant that the first instances of duplicates are removed (i.e. the first and second, not the second and third element) or are elements indistinguishable? – MisterMiyagi Jan 17 '23 at 15:11
  • isn't the output should be `[[1, 3, 2, 6], [5, 4, 6, 7], [9, 2, 7, 10]]` ? – sahasrara62 Jan 17 '23 at 15:22
  • @sahasrara62 added my attempt so far. – Pronitron Jan 17 '23 at 15:24
  • @MisterMiyagi yes sorry, I mean the element indexed by the numbers 1, 2 and 8, have edited the posting to make it clearer. And yes they are indistinguishable if they are duplicates, so can remove any of the to just leave one remaining. – Pronitron Jan 17 '23 at 15:25
  • @Pronitron so you want to remove the first duplicate element and all the element corresponde to the index of duplicate in the sublist ? – sahasrara62 Jan 17 '23 at 15:26
  • @sahasrara62 no the output should be as I stated, the same index elements must be consistently removed from all sublists to leave only one from a number of duplicates, but this should only be done if the same index elements across sublists are all duplicates within their corresponding sublist. – Pronitron Jan 17 '23 at 15:33
  • what would be the expected output when the original list is: `[[1,1,1],[1,1,1],[1,2,1]]` ? an already seen triplet but not in a repetition is considered seen or a new one? – Gábor Fekete Jan 17 '23 at 15:41
  • 1
    @GáborFekete in this case we would say that elements index 0 and 2 are consistently duplicated across sublists, so could remove either element index 0 or index 2 to be left with `[[1,1],[1,1],[1,2]] or [[1,1],[1,1],[2,1]]` – Pronitron Jan 17 '23 at 15:46

2 Answers2

2

Assuming all sub-lists will have the same length, this should work:

l = [ [1,1,1,3,3,2,4,6,6], [5,5,5,4,5,6,5,7,7], [9,9,9,2,2,7,8,10,10] ]

[list(x) for x in zip(*dict.fromkeys(zip(*l)))]

# Output: [[1, 3, 3, 2, 4, 6], [5, 4, 5, 6, 5, 7], [9, 2, 2, 7, 8, 10]]

Explanation:

  1. zip(*l) - This will create a new 1-dimension array. The nth element will be a tuple with all the nth elements in the original sublists:
[(1, 5, 9),
 (1, 5, 9),
 (1, 5, 9),
 (3, 4, 2),
 (3, 5, 2),
 (2, 6, 7),
 (4, 5, 8),
 (6, 7, 10),
 (6, 7, 10)]
  1. From the previous list, we only want to keep those that are not repeated. There are various ways of achieving this. If you search how to remove duplicates while mantaining order, this answer will pop up. It uses dict.fromkeys(<list>). Since python dict keys must be unique, this removes duplicates and generates the following output:
{(1, 5, 9): None,
 (3, 4, 2): None,
 (3, 5, 2): None,
 (2, 6, 7): None,
 (4, 5, 8): None,
 (6, 7, 10): None}
  1. We now want to unzip those keys to the original 2-dimensional array. For that, we can use zip again:
zip(*dict.fromkeys(zip(*l)))
  1. Since zip returns tuples, we have to finally convert the tuples to list using a list comprehension:
[list(x) for x in zip(*dict.fromkeys(zip(*l)))]
Carles Mitjans
  • 4,786
  • 3
  • 19
  • 38
-2

I would go with something like this. It is not too fast, but dependent on the size of your lists, it could be sufficient.

L = [ [1,1,1,3,3,2,4,6,6], [5,5,5,4,5,6,5,7,7], [9,9,9,2,2,7,8,10,10] ]

azip = zip(*L)
temp_L = []
for zz in azip:
    if not zz in temp_L:
        temp_L.append(zz)
new_L = [list(zip(*temp_L))[zz] for zz in range(len(L))]

first, we zip the three (or more) lists within L. Then, we iterate over each element, check if it already exists. If not, we add it to our temporary list temp_L. And in the end we restructure temp_L to be of the original format. It returns

new_L
>> [(1, 3, 3, 2, 4, 6), (5, 4, 5, 6, 5, 7), (9, 2, 2, 7, 8, 10)]
Lukas S
  • 22
  • 3
  • "not too fast" is an understatement – this is outright wasteful, being in the O(n^2) to O(n^3) ballpark for no benefit at all. Using a list for `temp_L` makes the `if not zz in temp_L` test extremely slow, and the creation of `new_L` repeatedly builds the correct solution only to throw it away and rebuild the correct solution from the few parts that are kept. – MisterMiyagi Jan 17 '23 at 15:21
  • As written, dependent on the size of the lists, this could be sufficient. We do not know in which context their problem exists. If it is not a bottleneck anyway, then the readability might be worth the "slowness". – Lukas S Jan 17 '23 at 15:29