0

The code below generates random key, value pairs in a dictionary and then sorts the dictionary. I am wondering how to insert 100 random key, value pairs into sorted dictionary and keep it sorted.

from random import randrange
mydict = {}
for i in range(10):
   mydict['key'+str(i)] = randrange(10)
sort_mydic=sorted(mydict.items(), key=lambda x: x[1])
  • Python's dict is a hash table. It is not an array or tree. Sorted keys doesn't make a hash table faster. Why do you want a sorted dict? What operations do you need to optimize? How fast is fast enough? Please tell us the hard numbers: number of keys, sorts, insertions, and retrievals per seconds. Is this done on hard disk or in memory? What is the read/write pattern of this dict? Otherwise, we can't choose the balance point between performance and convenience for you. We don't need the fastest for everything. We just need to be fast enough. – Crawl Cycle Nov 05 '20 at 01:11

3 Answers3

1

Based on the description above, I would skip the step of creating the dictionary. If you start from non-empty dictionary, then run the code above. Once it is set up, use bisect and insert as below.

new_item = randrange(10)
new_key = 'key' + str(new_item)
# find the index where to insert
idx = bisect.bisect(sort_mydic, (new_key, -1)) # the second argument should be smaller than any value
# check if the key is already used or not
if idx == len(sort_mydic) or sort_mydic[idx][0] != new_key:
    sort_mydic.insert(idx, (new_key, new_item))
else:
    # update the value -- as in dictionary
    sort_mydic[idx] = (new_key, new_item)

In case you need to retrieve an item.

def get_item(key):
    idx = bisect.bisect(sort_mydic, (key, -1))
    if idx == len(sort_mydic) or sort_mydic[idx][0] != key:
        return None # there is no such item
  
    else:
        return sort_mydic[idx][1]
Kate Melnykova
  • 1,863
  • 1
  • 5
  • 17
  • it raised with <' not supported between instances of 'int' and 'str for line idx –  Nov 04 '20 at 18:13
  • Fixed the typo, sorry – Kate Melnykova Nov 04 '20 at 18:51
  • If `idx==len(sort_mydic)`, then I use the insert method will act as an append. No error. – Kate Melnykova Nov 04 '20 at 21:27
  • The idea of sorting keys requires sorting them very often. You may store two data structured: dictionary and the sorted list of keys, but... what is the benefit of using the sorted list of keys in the first place? Simpler code should be preferred at all times. – Kate Melnykova Nov 04 '20 at 21:29
  • I think: 1. Sorting array of keys is faster than sorting dict. 2. Sorted keys don't make dict faster. 3. Keep keys sort. Sort often. -> More speed up for sorted keys + unsorted dict over sorted dict. – Crawl Cycle Nov 04 '20 at 23:04
  • Java has both hash map and tree map. That makes sense: https://stackoverflow.com/questions/2444359/what-is-the-difference-between-a-hashmap-and-a-treemap – Crawl Cycle Nov 04 '20 at 23:55
  • Your code inset just single random pair. I need to insert multiple random numbers e.g., 100 or 1000 numbers. –  Nov 05 '20 at 00:15
  • Bob, you can write the loop. – Crawl Cycle Nov 05 '20 at 01:06
  • I add a loop and it just adds the numbers to sort_mydic and the final dict is not sorted. –  Nov 05 '20 at 01:27
1

Is there any reason you can't use OrderedDict for this purpose? That's what it is meant for. Maybe something like this (haven't checked that it compiles yet):

from random import randrange
from collections import OrderedDict

mydict = {}
for i in range(10):
   mydict['key'+str(i)] = randrange(10)
sort_mydic = OrderedDict(sorted(mydict.items(), key=lambda x: x[1]))

OrderedDict behaves like any other dictionary except that it is guaranteed to be in insertion order and can be rearranged as necessary. In fact, there's probably a "better" way than the above to do what you want that does not involve an intermediate dict.

deriamis
  • 121
  • 3
  • I tried that before and then I was thinking how to insert multiple random key, value pairs into it. –  Nov 04 '20 at 18:16
  • The other half of your question is the same as for `dict` - use the `update` method. `OrderedDict` is just a `dict` with particular behavior. What's happening on the last line is that a temporary list of tuples is being created, which when passed to the `OrderedDict` constructor creates one from it. You can do the same with an ordinary `dict` as well. – deriamis Nov 05 '20 at 20:32
0

Order of keys in a dictionary is insertion order for python 3.6+.

Sorting a dict in older versions of python is impossible.

Solution:

  • Insert new key-value pairs
  • Sort
from random import randrange


def make_rand_dict(prefix, n_items=2):
    keys = [prefix + str(i) for i in range(n_items)]
    return {k: randrange(10) for k in keys}


def sort_dict(d):
    return {k: v for k, v in sorted(d.items())}


def main():
    # Make dict with sorted and random 
    # key-value pair
    original = make_rand_dict("key", n_items=2)
    original = sort_dict(original)

    # New random key-value pairs
    # Set n_items=100 for more pairs
    update = make_rand_dict("aye", n_items=2)

    # Insert the random key-value pair from 
    # the update dict into the original dict
    original.update(update)
    print("before sort:", original)

    # Sort the dict
    sorted_combined = sort_dict(original)
    print("after sort:", sorted_combined)


if __name__ == '__main__':
    main()

Result:

before sort: {'key0': 9, 'key1': 7, 'aye0': 2, 'aye1': 4}
after sort: {'aye0': 2, 'aye1': 4, 'key0': 9, 'key1': 7}
Crawl Cycle
  • 257
  • 2
  • 8
  • Thanks, and then how to insert multiple random key, value pairs into sorted one as I described in question. –  Nov 04 '20 at 18:24
  • the `update` dict has multiple random key-value pairs. `original.update(update)` inserts the update dict's random key-value pair into the original dict. I rename `make_dict(...)` to `make_rand_dict(...)` to reflect this. – Crawl Cycle Nov 04 '20 at 18:24
  • I want to see how much time takes to insert 10 or 100 key, value pairs into a sorted dictionary, I am not sure how to insert different number of key, value pairs into your code –  Nov 04 '20 at 18:40
  • 1
    If you want performance, then I would just sort a list of keys instead of the key-value pair. The advantage of dict is fast insertion, deletion and look up. It is not the most memory efficient structure. Sorting key is enough. No need to sort key-value pair. – Crawl Cycle Nov 04 '20 at 18:48
  • it raised with' make_rand_dict() got an unexpected keyword argument 'n_items'' –  Nov 04 '20 at 18:49
  • You are updating dictionary and then sorting again. This cannot give us the performance of insertion in dictionary. It's more sorting than insertion. I need to insert into sorted dictionary by finding right index for each key. –  Nov 05 '20 at 00:34
  • I know Python's dict is a hash table. I want to measure run time of insertion of n = 10, n = 100, n = 1000 key, value pairs into a dictionary and compare with other structures. –  Nov 05 '20 at 01:15