2 dimensional list sorting python 3.6.1 anaconda

Question

lijst = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [],
         [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [],
         [], [], [], [], [], [], [], [], [], [], [], [],
         ['/vacatures/oracle-plsql-ontwikkelaar-1/'], [], [], [], [],
         ['/vacatures/oracle-plsql-ontwikkelaar-1/'],
         ['/vacatures/business-intelligence-developer-1/'], [], [], [], [], [],
         ['/vacatures/business-intelligence-developer-1/'],
         ['/vacatures/oracle-dba/'], [], [], ['/vacatures/oracle-dba/'],
         ['/vacatures/database-beheerder/'], [], [], [],
         ['/vacatures/database-beheerder/'],
         ['/vacatures/sql-server-dba-powershell/'], [], [], [],
         ['/vacatures/sql-server-dba-powershell/'],
         ['/vacatures/junior-msbi-consultant/'], [], [], [], [], [],
         ['/vacatures/junior-msbi-consultant/'],
         ['/vacatures/senior-msbi-consultant/'], [], [], [], [], [],
         ['/vacatures/senior-msbi-consultant/'],
         ['/vacatures/medior-msbi-consultant/'], [], [], [], [],
         ['/vacatures/medior-msbi-consultant/'],
         ['/vacatures/zos-mainframe-specialist/'], [], [],
         ['/vacatures/zos-mainframe-specialist/'],
         ['/vacatures/junior-business-analyst/'], [], [], [], [],
         ['/vacatures/junior-business-analyst/'], [], [], [], [], [], [], [],
         [], ['/vacatures/oracle-plsql-ontwikkelaar-1/'], [], [],
         ['/vacatures/oracle-dba/'], [], [],
         ['/vacatures/business-intelligence-developer-1/'], [], [],
         ['/vacatures/database-beheerder/'], [], [],
         ['/vacatures/sql-server-dba-powershell/'], [], [], [], [], [], [], [],
         [], [], []]

I have a question. How can I filter out the empty lists and remove the duplicate items inside of the 2 dimensional list?

What should be the output? – Eric Duminil Sep 19 '17 at 17:51 — Eric Duminil, Sep 19 '17 at 17:51
@EricDuminil In case the OP wants to make it flat ? – keepAlive Sep 19 '17 at 17:58 — keepAlive, Sep 19 '17 at 17:58

keepAlive · Answer 1 · 2017-09-20T05:30:54.010

It is as simple as doing

new_list0 = list(filter(len, lijst))

and then to remove duplicate, you could turn new_list into set and then cast it back to a list. As follows

new_list1 = list(set(tuple(x) for x in new_list0))

And if you want to cast the elements of new_list1 (that are tuple now) back to lists, something you can do is

new_list2 = list(map(list, new_list1))

But, given the number of back and forth performed above(casting from generator, to list, to set, ..., to list, and so on), something which appears better in term of performance is probably

new_list = []
for el in lijst:
    if el and el not in new_list:
        new_list.append(el)            
#print(new_list)

Finally, note that new_list will still be 2-dimensional, as the original one. If you want to make it 1-dimensional, something you can do is making it flat, as follows

import itertools
new_list = list(itertools.chain.from_iterable(new_list))

or directly creating it as a 1-dimensional list and reducing the time complexity to O(n) (instead of O(n**2) by avoiding the in operator)

new_set = set()
for el in lijst:
    if el:
        new_set.update(el)        
new_list = list(new_set)

answer tested and functional

Looks good. If necessary you can cast the inner tuples back to lists. — andrewlamb, Sep 19 '17 at 17:08
Also, your unique examples are `O(n**2)`, even though they could be `O(n)`. — Eric Duminil, Sep 19 '17 at 18:11

Eric Duminil · Answer 2 · 2017-09-19T18:09:37.863

2

Your list isn't really 2 dimensional. Every list has either 0 or 1 element.

In that case, you could just extract the strings and put them into a set:

print({l[0] for l in lijst if l})

It outputs:

set(['/vacatures/junior-msbi-consultant/', '/vacatures/junior-business-analyst/', '/vacatures/business-intelligence-developer-1/', '/vacatures/zos-mainframe-specialist/', '/vacatures/sql-server-dba-powershell/', '/vacatures/database-beheerder/', '/vacatures/medior-msbi-consultant/', '/vacatures/oracle-dba/', '/vacatures/oracle-plsql-ontwikkelaar-1/', '/vacatures/senior-msbi-consultant/'])

It's concise and fast (O(n)).

edited Sep 19 '17 at 18:09

answered Sep 19 '17 at 18:04

Eric Duminil

52,989
9
71
124

Clearly better than my answer. Except that it does not work if the OP wants to keep the final output as a (not-really) 2d list.. which is not likely however. – keepAlive Sep 19 '17 at 19:40

2 dimensional list sorting python 3.6.1 anaconda

2 Answers2