Creating a dictionary with lists as values, and sorting the elements inside of the lists

Question

I'm building a function that takes a list which contains three different types of elements: integers, floats, and strings. The function converts the list to a dictionary with keys for each of these three different categories. Then each element in that original list is placed into the appropriate key-value pair (e.g. all string elements in the list get assigned to the "string" key). I'm able to get this working correctly, however I'm unable to sort the values inside the dictionary values (which are lists). Here's what I have:

def list_dict_sort(input_list): 
    mixed_list =['asdf', 33, 'qwerty', 123.45, 890, 3.0, 'hi', 0, 'yes', 98765., '', 25]
    sorted_dict = {}                 
    sorted_dict['integer'] = []       
    sorted_dict['float'] = []
    sorted_dict['string'] = []
    for x in mixed_list:           
        if "int" in str(type(x)):  
            sorted_dict['integer'].append(x) 
        elif "float" in str(type(x)):
            sorted_dict['float'].append(x)
        elif "str" in str(type(x)):
            sorted_dict['string'].append(x)
    sorted_dict.sort
    return(sorted_dict)

list_dict_sort(mixed_list)

this returns:

{'float': [123.45, 3.0, 98765.0],
 'integer': [33, 890, 0, 25],
 'string': ['asdf', 'qwerty', 'hi', 'yes', '']}

so structurally the function gets me what I want, except that the value-lists are sorted. The exact output that I'd like is:

{'float': [3.0, 123.45, 98765.0],
 'integer': [0, 25, 33, 890],
 'string': ['asdf', 'hi', 'qwerty',  'yes', '']}

Ideally I want to avoid doing an import here (e.g. operator), I'm just looking for a simple/basic way of sorting the value-lists. I tried using sort and sorted() but couldn't figure out how to build them in to what I already have. Is there a clean way of doing this or is there a more efficient way?

You could use `bisect.insort` when you inserting each element. However, you will have to do `import bisect`. See [here](https://stackoverflow.com/a/18001030/1586200) for more details. — Autonomous, Feb 06 '18 at 21:50
is there a way to add sorted or sort() into my current script? — David, Feb 06 '18 at 22:41
See if this works (not tested). Replace `sorted_dict['integer'].append(x) ` by `bisect.insort(sorted_dict['integer'], x)`. Same for floats and strings. Then remove the line `sorted_dict.sort`. — Autonomous, Feb 06 '18 at 23:58

score 2 · Answer 1 · answered Feb 06 '18 at 21:45

2

You could just go over the values and sort them:

for v in sorted_dict.values():
    v.sort();

answered Feb 06 '18 at 21:45

Mureinik

297,002
52
306
350

I tried this and it returns {'float': [], 'integer': [], 'string': []} – David Feb 06 '18 at 22:53
@David this definitely does work. Please edit your question to show how you integrated this snippet to you code and I'll try to see what went wrong. – Mureinik Feb 07 '18 at 06:19

score 1 · Answer 2 · answered Feb 06 '18 at 21:49

1

Note that you could also use the type of the mixed elements as the dictionary key, so it can be calculated directly from the elements as they are inserted, and so that when retrieving later, you don't need to know a special string (e.g. "wait, did I use 'integer' or 'int' for the key?")...

In [4]: from collections import defaultdict

In [5]: d = defaultdict(list)

In [6]: mixed_list = ['asdf', 33, 'qwerty', 123.45, 890, 3.0, 'hi', 0, 'yes', 98765., '', 25]

In [7]: for value in mixed_list:
   ...:     d[type(value)].append(value)
   ...:     

In [8]: d
Out[8]: 
defaultdict(list,
            {str: ['asdf', 'qwerty', 'hi', 'yes', ''],
             int: [33, 890, 0, 25],
             float: [123.45, 3.0, 98765.0]})

In [9]: for k, v in d.items():
   ...:     v.sort()
   ...:     

In [10]: d
Out[10]: 
defaultdict(list,
            {str: ['', 'asdf', 'hi', 'qwerty', 'yes'],
             int: [0, 25, 33, 890],
             float: [3.0, 123.45, 98765.0]})

In the last result, note that default string sorting is going to put '' at the front. You'd need to write your own string comparator that would evaluate any string as less than the empty string if you need it to be sorted to the final position.

answered Feb 06 '18 at 21:49

ely

74,674
34
147
228

In my option, using `type(value)` as a key is not a good idea. Consider `numpy.float64(5.1321)`, the type will come out as `numpy.float64`. Use `isinstance` always and maintain an explicit mapping. – jpp Feb 06 '18 at 23:47
@jp_data_analysis But what if you want a key for `np.float64` that is a different key than `float`? And if you don't want these to be two different keys, then the burden is on you to first do type conversions on the elements on the mixed list. I agree it could be application-specific as to which way is better. But there's nothing inherently better about resolving the type value with `isinstance` ... especially given that instance and subclass checking are also overrideable with metaclasses. So if someone is relying on keys that came from isinstance, the actual element might fail duck typing. – ely Feb 06 '18 at 23:54
My point is that you should be explicit. I'm not making assumption on what user wants. There are many answers, e.g. [here](https://stackoverflow.com/a/1549854/9209546), which explain why it is a good assumption to incorporate inheritance in type checking. `isinstance` isn't perfect, nothing is, but it is considered better than `type()`. Is your point that a subclass of `float` or `int` should not be considered `float` or `int`? – jpp Feb 07 '18 at 00:00
@jp_data_analysis My view is that using `isinstance` as the means to check if a type is `int` is *less explicit*, because a custom object could have overridden instance checking in a way that breaks things. For example, if you plan to process the `int` elements later on, but one of them is a custom class that doesn't support all `int` operations (but was customized to return `True` for `isinstance(..., int)`, then it all becomes a huge black box problem. But if the entries are stored according to `type` only, then it is explicit, because a user must convert types explicitly ahead of time. – ely Feb 07 '18 at 13:13

jpp · Answer 3 · 2018-02-06T23:42:29.703

Here is a minimalist solution via collections.defaultdict and sortedcontainers.SortedList. With SortedList, your list is guaranteed to be always sorted.

Note I have also replaced your type-checking with isinstance and added a dictionary mapping types to keys. The purpose of this is to separate logic from configuration / variables.

from collections import defaultdict
from sortedcontainers import SortedList

mixed_list = ['asdf', 33, 'qwerty', 123.45, 890, 3.0, 'hi', 0, 'yes', 98765., '', 25]

def list_dict_sort(input_list): 

    sorted_dict = defaultdict(SortedList)

    mapper = {int: 'integer',
              float: 'float',
              str: 'string'}

    for x in mixed_list:
        for k, v in mapper.items():
            if isinstance(x, k):
                sorted_dict[v].add(x)
                break

    return(sorted_dict)

list_dict_sort(mixed_list)

# defaultdict(sortedcontainers.sortedlist.SortedList,
#             {'float': SortedList([3.0, 123.45, 98765.0], load=1000),
#              'integer': SortedList([0, 25, 33, 890], load=1000),
#              'string': SortedList(['', 'asdf', 'hi', 'qwerty', 'yes'], load=1000)})

Creating a dictionary with lists as values, and sorting the elements inside of the lists

3 Answers3