-2

How do we deduplicate elements within nested lists based on the elements within another nested list? Or, does it make more sense to iterate through a column and drop duplicates based on a list of elements in another column?

Column 1
R1 = [foo, bar, baz, qux,] 
R2 = [Cat, Dog, Frog, Bird]
R3 = [Salad, Potato, Pizza, Soda

Column 2
R1 = [bar, quuz, quux, qux]
R2 = [Fish, Dog, Cow]
R3 = [Potato, Milk, Apple, Pizza]

I only care to keep elements unique to listB and order does not matter

Final Column
R1 = [quuz, quux]
R2 = [Fish, Cow]
R3 = [Milk, Apple]

An actual list looks like this and includes the following characters: \ , ()

[Youth Counselor / Worker, Nutrition / Dietetic Technician,Mathematician,Tailor / Seamstress,Librarian]

I must protect each element as a string so a flat list wont work in this case

NJT
  • 21
  • 2

4 Answers4

3

If order doesn't matter:

  • use set operations on each pair of nested lists. Subtracting two sets produces a new set with only the elements from the first that don't appear in the second.
  • Use the zip() function to pair up the nested lists from your two input lists.
  • If your output must consist of nested lists again, convert the result of the set operation back to a list, with list()
  • Use a list comprehension to process each pair of nested lists, creating a new list with the results.

This can be expressed as a one-liner with:

[list(set(b) - set(a)) for a, b in zip(listA, listB)]

You can drop the list(...) call if nested sets in your output is acceptable:

[set(b) - set(a) for a, b in zip(listA, listB)]

Demo:

>>> listA = [['A', 'B', 'C', 'D', 'E'], [1, 2, 3, 4, 5], ['!', '@', '#', '$', '%']]
>>> listB = [['E', 'A', 'T', 'F', 'W'], [5, 6, 8, 2, 9], ['@', '^', '&', '#', '*']]
>>> [list(set(b) - set(a)) for a, b in zip(listA, listB)]
[['W', 'F', 'T'], [8, 9, 6], ['^', '&', '*']]
>>> [set(b) - set(a) for a, b in zip(listA, listB)]  # without list(...)
[{'W', 'F', 'T'}, {8, 9, 6}, {'^', '&', '*'}]

If you change your mind and decide that order does matter for the output, then:

  • Only convert each nested list in listA to a set, once, for faster containment testing. value in listobject has to iterate through listobject each time, whereas value in setobject uses hashing to test for containment in O(1) (constant) time.
  • Loop over the values in a given nested list in listB and test the value against the matching set from listA, keeping only the values that don't appear in the corresponding set. Use a list comprehension for this.
  • You could use map() to handle converting the listA nested lists to sets as you pair up the lists. This then helps avoid creating a new set each time you test a value from the nested list from listB.

So a one-liner that preserves input ordering is:

[[v for v in nested_b if v not in set_a] for set_a, nested_b in zip(map(set, listA), listB)]

The zip() function pairs up the sets produced from listA (via map(set, listA)) and the nested lists from listB so we can use them together each iteration of the outermost list comprehension. The nested list comprehension then filters the values for each nested list:

>>> [[v for v in nested_b if v not in set_a] for set_a, nested_b in zip(map(set, listA), listB)]
[['T', 'F', 'W'], [6, 8, 9], ['^', '&', '*']]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Apologies, but I may have confused the question by the format of my example lists. `[Youth Counselor / Worker, Nutrition / Dietetic Technician,Mathematician,Tailor / Seamstress,Librarian]` This is a more accurate list of what I'm working with. I tried this method and it separates my elements into single characters and then drops duplicates. `[['D','b', 'A','h','i',',','E','v',`]] – NJT Aug 07 '19 at 17:34
  • @NathanTriepke: That's not a Python list. If you are working with a Pandas dataframe and series, post a new question with a [clear example dataframe in the question itself](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Martijn Pieters Aug 10 '19 at 09:13
  • @NathanTriepke: the output you see suggests you *don't have nested lists*. – Martijn Pieters Aug 10 '19 at 09:14
0

Using sets, You can do:

listA = [["A","B","C","D","E"],[1,2,3,4,5],["!","@","#","$","%"]]
listB = [["E","A","T","F","W"],[5,6,8,2,9],["@","^","&","#","*"]]


print([list(set(listB[i]).difference(set(listA[i]))) for i in range(len(listB))])

Gives me:

[['F', 'W', 'T'], [8, 9, 6], ['^', '&', '*']]

Note: it will change the order of list.

Edit:

Or as @user3483203 suggested, more reliable solution than this will be:

[[list(b - a) for a, b in zip(map(set, listA), map(set, listB))]
R4444
  • 2,016
  • 2
  • 19
  • 30
0

Assuming that you want to check by index of lists

result_list = []
for listA_nest,listB_nest zip(listA,listB):
    result_list.append(list(filter(lambda listB_el: listB_el not in listA_nest ,listB_nest))

Something like that should work, there is probagbly a better solution.

One line solution:

result_list = [list(filter(lambda listB_el: listB_el not in set(listA_nest) ,listB_nest)) for listA_nest,listB_nest in zip(listA,listB)]
0

Assuming you are checking corresponding lists in A and B, you'll probably want zip to keep them together, and you can use the in method to check membership:

listFinal = []
for l1, l2 in zip(listA, listB):
    l = [x for x in l2 if x not in l1]
    listFinal.append(l)

[['T', 'F', 'W'], [6, 8, 9], ['^', '&', '*']]

Though the faster way would be to use a set, which allows you to quickly deduplicate collections and test for membership in O(1), rather than O(N):

listFinal = []

for l1, l2 in zip(listA, listB):
    # set subtraction here will remove all elements present in l1 from l2
    l = set(l2) - set(l1)
    listFinal.append(list(l))

[['T', 'F', 'W'], [8, 9, 6], ['*', '&', '^']]

Or, in one line if you prefer

listFinal = [list(set(l2) - set(l1)) for l1, l2 in zip(listA, listB)]

To show how zip works:

a = [1, 2, 3, 4]
b = [5, 6, 7, 8]

for x, y in zip(a, b):
    print(x, y)

1 5
2 6
3 7
4 8

It will generate groups of corresponding elements for each iterable passed to it.

Set subtraction:

a = set('a', 'b', 'c')
b = set('b', 'c', 'd')

a - b

{'a'}

C.Nivs
  • 12,353
  • 2
  • 19
  • 44