2

I have this data:

list_of_dicts_of_lists = [
    {'a': [1,2], 'b': [3,4], 'c': [3,2], 'd': [2,5]}
    {'a': [2,2], 'b': [2,2], 'c': [1,6], 'd': [4,7]}
    {'a': [2,2], 'b': [5,2], 'c': [3,2], 'd': [2,2]}
    {'a': [1,2], 'b': [3,4], 'c': [1,6], 'd': [5,5]} 
    ]

I need this result:

median_dict_of_lists = (
    {'a': [1.5,2], 'b': [3,3], 'c': [2,4], 'd': [3,5]}
    )

...where each value is the median of the respective column above.

I need the mode dictionary where available and median dictionary when no mode exists. I was able to do quick and dirty statistics.mode() by stringing each dict, getting mode of list of strings, then ast.literal_eval(most_common_string) back to a dict, but I need a column wise median in cases where there is no mode.

I know how to use statistics.median(); however, the nested notation to apply it to this case, column wise, is frazzling me.

The data is all floats; I wrote it as int just to make easier to read.

mpour
  • 746
  • 5
  • 16
litepresence
  • 3,109
  • 1
  • 27
  • 35

3 Answers3

4

You can use statistics.median with itertools.groupby:

import statistics
import itertools
list_of_dicts_of_lists = [
  {'a': [1,2], 'b': [3,4], 'c': [3,2], 'd': [2,5]},
  {'a': [2,2], 'b': [2,2], 'c': [1,6], 'd': [4,7]},
  {'a': [2,2], 'b': [5,2], 'c': [3,2], 'd': [2,2]},
  {'a': [1,2], 'b': [3,4], 'c': [1,6], 'd': [5,5]} 
]
new_listing = [(a, list(b)) for a, b in itertools.groupby(sorted(itertools.chain(*map(lambda x:x.items(), list_of_dicts_of_lists)), key=lambda x:x[0]), key=lambda x:x[0])]
d = {a:zip(*map(lambda x:x[-1], b)) for a, b in new_listing}
last_data = ({a:[statistics.median(b), statistics.median(c)] for a, [b, c] in d.items()},)

Output:

({'a': [1.5, 2.0], 'b': [3.0, 3.0], 'c': [2.0, 4.0], 'd': [3.0, 5.0]},)
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
3

You can use the following dictionary comprehension with numpy:

import numpy as np
median_dict_of_lists = {i : list(np.median([x[i] for x in list_of_dicts_of_lists], axis=0)) 
                    for i in 'abcd'}

Which returns the same:

{'a': [1.5, 2.0], 'c': [2.0, 4.0], 'b': [3.0, 3.0], 'd': [3.0, 5.0]}

To explain, np.median([x[i] for x in list_of_dicts_of_lists], axis=0), embedded in the dictionary comprehension, is going through each key i in ['a', 'b', 'c', 'd'], and getting the median of each key for all of your dicts in your original list of dicts. This median is getting assigned to a new dictionary with the appropriate key via the dictionary comprehension syntax.

There is a good explanation of the dictionary comprehension syntax here, and the documentation for np.median explains the function itself quite well

sacuL
  • 49,704
  • 8
  • 81
  • 106
  • 2
    Less cryptic than the other solution IMO, and obviously more concise. Note that the `list` conversions aren't necessary... – Julien Mar 19 '18 at 00:03
  • True, at least for the `list('abcd')` (editted). You could also get rid of the `list` conversion of the `np.array` which is outputted by `np.median`, but I was trying to get the output exactly as shown by the OP, in which it looks like a list and not an `np.array`. In the end, though, it wouldn't make a difference... – sacuL Mar 19 '18 at 00:06
  • And the explicit `np.array` conversion is unnecessary too. Maybe a quick explanation of how that works will please the OP :) – Julien Mar 19 '18 at 00:10
  • True! Don't know why I was doing that! – sacuL Mar 19 '18 at 00:13
2

You could also break it down in small steps with meaningful names to make the solution more maintainable. For example:

# combine dictionary arrays into a 3d matrix, removing dictionary keys
valueMatrix3D = [ list(row.values()) for row in list_of_dicts_of_lists ]

# compute the median for each row's array (axis 1)
medianArrays  = np.median(valueMatrix3D,axis=1)

# reassemble into a dictionary with original keys
medianDict = { key:list(array) for key,array in zip(list_of_dicts_of_lists[0] ,medianArrays) } 
Alain T.
  • 40,517
  • 4
  • 31
  • 51