3

Say I had a nested list like so:

nested_list=[[123,'Aaron','CA'],[124,'Bob','WY'],[125,'John','TX']]
insert_me=[122,'George','AL']

The list is currently sorted (In alphabetical order) by the middle value of each sublist, I want to add the value insert_me in its correct place in the nested list. In order to maintain alphabetical order it needs to be added between the lists with 'Bob' and 'John' in them. I know bisect would normally be used for a task like this with lists but don't understand how I could use bisect for a nested list like this.

mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
  • 1
    Ultimately, a tree is probably a better data structure for this if you're going to be performing a whole bunch of insertions. – mgilson Mar 22 '13 at 19:52

3 Answers3

5

See the example in the Python documentation for bisect:

Unlike the sorted() function, it does not make sense for the bisect() functions to have key or reversed arguments because that would lead to an inefficient design (successive calls to bisect functions would not “remember” all of the previous key lookups).

Instead, it is better to search a list of precomputed keys to find the index of the record in question:

>>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
>>> data.sort(key=lambda r: r[1])
>>> keys = [r[1] for r in data]         # precomputed list of keys
>>> data[bisect_left(keys, 0)]
('black', 0)
>>> data[bisect_left(keys, 1)]
('blue', 1)
>>> data[bisect_left(keys, 5)]
('red', 5)
>>> data[bisect_left(keys, 8)]
('yellow', 8)

So in your case:

nested_list = [[123,'Aaron','CA'],[124,'Bob','WY'],[125,'John','TX']]
insert_me = [122,'George','AL']                                
keys = [r[1] for r in nested_list]
nested_list.insert(bisect.bisect_left(keys,insert_me[1]),insert_me)
[[123, 'Aaron', 'CA'],
 [124, 'Bob', 'WY'],
 [122, 'George', 'AL'],
 [125, 'John', 'TX']]

To avoid rebuilding the keys everytime, insert new values into keys as well:

keys.insert(bisect_left(keys,insert_me[1]),insert_me[1])

Update:

Did some performance comparison between insert/bisect, append/sorted and heapq solutions:

# elements  heapq   insert/bisect  append/sorted
10,000      0.01s   0.08s           2.43s         
20,000      0.03s   0.28s          10.06s
30,000      0.04s   0.60s          22.81s
isedev
  • 18,848
  • 3
  • 60
  • 59
  • The problem with this is that you'd need to re-build your keys every time you *insert* something which will destroy your O(logn) efficiency. (of course, `insert` is already O(n) so ... that's already worse than you probably want ...) – mgilson Mar 22 '13 at 19:49
  • But couldn't the list of keys instead of being re-built every time just be cached in order to preserve the O(nlogn) efficiency? – user1789376 Mar 22 '13 at 19:55
  • You can subsequently insert into keys using bisect_left as well... so 2O(n). But I agree with mgilson - a tree structure is probably better suited if there are going to be many insertions. – isedev Mar 22 '13 at 19:55
  • @isedev -- good point about updating the list of keys also. for some reason I hadn't thought about that. I still hold that a list isn't the proper data structure for this if the eventual list will be really big. Some form of AVL tree is probably better. – mgilson Mar 22 '13 at 20:00
4

I would use a specialization of a heap for your problem. Take the heap class from this answer and your code will be:

import heapq

class MyHeap(object):
    def __init__(self, initial=None, key=lambda x:x):
        self.key = key
        if initial:
            self._data = [(key(item), item) for item in initial]
            heapq.heapify(self._data)
        else:
            self._data = []

    def push(self, item):
        heapq.heappush(self._data, (self.key(item), item))

    def pop(self):
        return heapq.heappop(self._data)[1]

h = MyHeap([[123,'Aaron','CA'],[124,'Bob','WY'],[125,'John','TX']], key=lambda x:x[1])
h.push([122,'George','AL'])
for _ in xrange(4):
    print h.pop()

Every list that you add with push will be in order with respect to the second element (which we control with the key=lambda x:x[1] argument in the constructor). You get the elements in order, one by one by calling pop.

Community
  • 1
  • 1
halex
  • 16,253
  • 5
  • 58
  • 67
2

You could alphabetize the list using sorted().

nested_list=[[123,'Aaron','CA'],[124,'Bob','WY'],[125,'John','TX']]
insert_me=[122,'George','AL']

nested_list.append(insert_me)
nested_list=sorted(nested_list, key=lambda x:x[1])

Sorted()

Jroosterman
  • 408
  • 4
  • 11
  • This would be very inefficient - you are sorting the list after every insert... Also, using `operator.getitem(1)` as opposed to a lambda is cleaner (IMO). – isedev Mar 22 '13 at 20:00
  • True, and I did consider this. However the intent is to repeatedly insert new sublists into the nested list, and by having to sort the list after every insertion it would drastically impact efficiency. – user1789376 Mar 22 '13 at 20:01
  • Yes it can get a bit cumbersome. It would be better if this were done only when you needed to view the contents of the list. – Jroosterman Mar 22 '13 at 20:02