0

I have a list of lists, and would like to keep the unique lists by ignoring one element of the list.

MWE:

my_list_of_lists = [['b','c','1','d'],['b','c','1','d'],['b','c','2','e']]

print(my_list_of_lists)

new_list_of_lists = []

for the_list in my_list_of_lists:
    if the_list not in new_list_of_lists:
        new_list_of_lists.append(the_list)

print(new_list_of_lists)

MWE Output:

[['b', 'c', '1', 'd'], ['b', 'c', '1', 'd'], ['b', 'c', '2', 'e']]  # 1st print
[['b', 'c', '1', 'd'], ['b', 'c', '2', 'e']]                        # 2nd print

Question:

Is there a Pythonic way to remove duplicates as with the example above by ignoring a specific element within the inner list? ie for my_list_of_lists = [['b','c','1','d'],['b','c','3','d'],['b','c','2','e']] should yield [['b','c','1','d'],['b','c','2','e']]

my_list_of_lists = [['b','c','1','d'],['b','c','3','d'],['b','c','2','e']] 
# my_list_of_lists[0] and my_list_of_lists[1] are identical 
# if my_list_of_lists[n][-2] is ignored

print(my_list_of_lists)

new_list_of_lists = []

for the_list in my_list_of_lists:
    if the_list[ignore:-2] not in new_list_of_lists: #ignore the second last element when comparing
        new_list_of_lists.append(the_list)

print(new_list_of_lists)
3kstc
  • 1,871
  • 3
  • 29
  • 53
  • By your "ignore" rule, why isn't `['b','c','2','d']` considered identical to the other two elements? – iz_ Jan 13 '20 at 02:12
  • Sorry my bad! corrected – 3kstc Jan 13 '20 at 02:20
  • 1
    *"ignoring a specific element"* - Which one? The first? The largest? The one that's a digit? Some other rule? Your example input doesn't specify. – Kelly Bundy Jan 13 '20 at 02:57
  • @HeapOverflow, I think a generic (non-specific) function would be best as other users in the future can integrate this generic function for their own use. – 3kstc Jan 13 '20 at 03:24

3 Answers3

2

This is not "Pythonic" per se, but it is relatively short and gets the job done:

my_list_of_lists = [['b','c','1','d'],['b','c','3','d'],['b','c','2','e']] 

print(my_list_of_lists)

new_list_of_lists = []

ignore = 2

for the_list in my_list_of_lists:
    if all(
      any(e != other_list[i]
          for i, e in enumerate(the_list)
          if i != ignore)
      for other_list in new_list_of_lists
    ):
        new_list_of_lists.append(the_list)

print(new_list_of_lists)

It outputs [['b', 'c', '1', 'd'], ['b', 'c', '2', 'e']] for the given input.

iz_
  • 15,923
  • 3
  • 25
  • 40
  • @HeapOverflow It should output `[['b'],['b'],['b']]`, but instead it outputs `[['b']]`. The point of the more complex condition is to prevent exactly this. – iz_ Jan 13 '20 at 03:04
  • @HeapOverflow Ah, I just realized that `[] == []`. Oops, hehe. Will fix. Thanks for pointing that out. :) – iz_ Jan 13 '20 at 03:06
2

My question and your reply from the comments:

"ignoring a specific element" - Which one? The first? The largest? The one that's a digit? Some other rule? Your example input doesn't specify. – Heap Overflow

@HeapOverflow, I think a generic (non-specific) function would be best as other users in the future can integrate this generic function for their own use. – 3kstc

Doing that @GreenCloakGuy's style:

def unique(values, key):
    return list({key(value): value for value in values}.values())

new_list_of_lists = unique(my_list_of_lists, lambda a: tuple(a[:2] + a[3:]))

A bit shorter:

def unique(values, key):
    return list(dict(zip(map(key, values), values)).values())

Those take the last duplicate. If you want the first, you could use this:

def unique(values, key):
    tmp = {}
    for value in values:
        tmp.setdefault(key(value), value)
    return list(tmp.values())
Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
1

The following approach

  1. Creates a dict
    1. where the values are the lists in the list-of-lists
    2. and the corresponding keys are those lists without the indices you want to ignore, converted to tuples (since lists cannot be used as dict keys but tuples can)
  2. Gets the dict's values, which should appear in order of insertion
  3. Converts that to a list, and returns it

This preserves the later elements in the original list, as they overwrite earlier elements that are 'identical'.

def filter_unique(list_of_lists, indices_to_ignore):
    return list({
        tuple(elem for idx, elem in enumerate(lst) if idx not in indices_to_ignore) : lst
        for lst in list_of_lists
    }.values())

mlol = [['b','c','1','d'],['b','c','3','d'],['b','c','2','d']] 
print(filter_unique(mlol, [2]))
# [['b', 'c', '3', 'd'], ['b', 'c', '2', 'e']]
print(filter_unique(mlol, [3]))
# [['b', 'c', '1', 'd'], ['b', 'c', '3', 'd'], ['b', 'c', '2', 'e']]

That's a one-liner, abusing a dict comprehension. A multi-line version might look like this:

def filter_unique(list_of_lists, indices_to_ignore):
    dct = {}
    for lst in list_of_lists:
        key = []
        for idx, elem in enumerate(lst):
            if idx not in indices_to_ignore:
                key.append(elem)
        dct[tuple(key)] = lst
    return list(dct.values())
iz_
  • 15,923
  • 3
  • 25
  • 40
Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53