1

I want to extract unique data from nested list, see below. I implemented two way of this. First one works good, but second one failed. Is new_data is empty during calculation? And how do I fix it?

 data = [                                                                                                                                      
     ['a', 'b'],                                                                                                                               
     ['a', 'c'],                                                                                                                               
     ['a', 'b'],                                                                                                                               
     ['b', 'a']                                                                                                                                
 ]                                                                                                                                             

 # working                                                                                                                                          
 new_data = []                                                                                                                                 
 for d in data:                                                                                                                                
     if d not in new_data:                                                                                                                     
         new_data.append(d)                                                                                                                    
 print(new_data)                                                                                                                               
 # [['a', 'b'], ['a','c'], ['b','a']]                                                                                                          

 # Failed to extract unique list                                                                                                                                 
 new_data = []                                                                                                                                 
 new_data = [d for d in data if d not in new_data]                                                                                             
 print(new_data)                                                                                                                               
 # [['a', 'b'], ['a', 'c'], ['a', 'b'], ['b', 'a']] 
jef
  • 3,890
  • 10
  • 42
  • 76
  • 3
    Yes, `new_data` is empty during the execution of the list comprehension, the results of which are assigned to `new_data` after execution. – Brenden Petersen Oct 11 '17 at 17:43
  • 1
    Just use your original version... that's how you fix it. – juanpa.arrivillaga Oct 11 '17 at 17:44
  • There is nothing in `new_data` when you test: `if d not in new_data`. List comprehensions run to completion before continuing to the assignment operator to save the result back to `new_data` – Aaron Oct 11 '17 at 17:44
  • 1
    [This](https://stackoverflow.com/questions/10549345/how-to-remove-duplicate-items-from-a-list-using-list-comprehension) Q&A explains why the list comprehension doesn't work – Wondercricket Oct 11 '17 at 17:44

2 Answers2

5

Just try:

new_data = [list(y) for y in set([tuple(x) for x in data])]

You cannot use set() on a list of lists because lists are not hashable. You convert the list of lists into a list of tuples. Apply set() to remove the duplicates. Then convert the de duplicated list of tuples back into a list of lists.

Spencer Bard
  • 1,015
  • 6
  • 10
0

you could use enumerate to test that there are no copies before the current value such that only the first instance of a copy is taken:

new_data = [item for index, item in enumerate(data) if item not in data[:index]]
Aaron
  • 10,133
  • 1
  • 24
  • 40