Python - Mean of each value across keys in dict

Question

I am having trouble iterating across an entire dictionary to do simple summary statistics (an average) for each element of a value across keys.

My dictionary consists of keys and values that are lists of numbers:

test_dict={'NJ':[20,50,70,90,100],'NY':[10,3,0,99,57],'CT':[90,1000,2,3.4,5]}

I know that I can access the first value of each key, for instance, by doing the below, but I am having trouble with the obvious next step of adding another for loop to iterate across all elements in the values.

location1=[element[0] for element in test_dict.values()] 
location1_avg=sum(location1)/len(location1)

My ultimate goal is to have a dictionary with labels as keys (Location 1...i) and the average value across states for that location. So the first key-value would be Location1: 40, and so on.

I have the below attempt, but the error message is 'list index out of range' and i do not know how to iterate properly in this case.

for element in test_dict.values():
    avg=list()
    for nums in element[i]:
        avg[i]=sum(element[i][nums])/len(element[i][nums])

Adding desired output per requests

soln_dict={'Location1':40,'Location2':351,'Loction3':24,'Loction4':43.24,'Loction5':54}

Thank you for your help!

Can you show what exactly you expect to be the result given the `test_dict`? — mkrieger1, Sep 23 '17 at 17:45
@Jean-FrançoisFabre I suspect 40 is supposed to be the average of 20, 10, and 90, so the desired result might be a list of 5 numbers, not a dictionary with 3 keys. — mkrieger1, Sep 23 '17 at 17:55
You are right - the desired output is a dictionary with five key-value pairs. The first one would be Location 1: 40 — Z_D, Sep 23 '17 at 18:19

Jean-François Fabre · Answer 1 · 2017-09-23T20:39:49.587

Not sure where your error lies but the i is a dead giveaway for "using indices where it's not useful / harmful".

Your problem has a straight input/output data stream, and is a perfect match for using dictionary comprehension, iterating on the key, values and rebuilding the dict with the mean as value:

test_dict={'NJ':[20,50,70,90,100],'NY':[10,3,0,99,57],'CT':[90,1000,2,3.4,5]}

result = {k:sum(x)/len(x) for k,x in test_dict.items()}

print(result)

gives:

{'CT': 220.08, 'NJ': 66.0, 'NY': 33.8}

EDIT: you seem to want a "transposed" version with anonymized keys, in that case, just use the zipped version of the values:

result = {"location{}".format(i):sum(v)/len(v) for i,v in enumerate(zip(*test_dict.values()),1)}

gives:

{'location3': 24.0, 'location5': 54.0, 'location1': 40.0, 'location2': 351.0, 'location4': 64.13333333333334}

Thank you - that is a nice way to do it. However, my desired output is to get the averages across keys for each element of the value. Please see edit for desired output. — Z_D, Sep 23 '17 at 18:27

DjaouadNM · Answer 2 · 2017-09-23T19:09:41.163

1

You can do this:

test_dict={'NJ':[20,50,70,90,100],'NY':[10,3,0,99,57],'CT':[90,1000,2,3.4,5]}
avg=[sum(element) / len(element) for element in test_dict.values()]
print(avg) # => [66.0, 33.8, 220.08]

And for a dictionary:

test_dict={'NJ':[20,50,70,90,100],'NY':[10,3,0,99,57],'CT':[90,1000,2,3.4,5]}
avg={ k:sum(test_dict[k]) / len(test_dict[k]) for k in test_dict}
print(avg) # => {'NJ': 66.0, 'NY': 33.8, 'CT': 220.08}

Answer to the edited question:

If the arrays always have a length of 5, use this:

test_dict={'NJ':[20,50,70,90,100],'NY':[10,3,0,99,57],'CT':[90,1000,2,3.4,5]}
avg={}
for i in range(5):
  avg['Location'+str(i+1)] = sum(test_dict[k][i] for k in test_dict)/len(test_dict)
print(avg)

Output:

{'Location1': 40.0, 'Location2': 351.0, 'Location3': 24.0, 'Location4': 64.13333333333334, 'Location5': 54.0}

edited Sep 23 '17 at 19:09

answered Sep 23 '17 at 17:46

DjaouadNM

22,013
4
33
55

Thanks, appreciate your help. Desired output is actually different - I explained it better in original post now. – Z_D Sep 23 '17 at 18:29
@Tony Do the arrays always have a fixed length of 5? – DjaouadNM Sep 23 '17 at 18:33
In this case, yes. – Z_D Sep 23 '17 at 18:36
@Tony See if my edited answer helps. – DjaouadNM Sep 23 '17 at 18:38
Thank you for your help. – Z_D Sep 24 '17 at 18:15

Kaushik NP · Accepted Answer · 2017-09-25T04:07:09.283

1

Just do :

#loop through the dictionary
for key,value in test_dict.items(): 

   #use reduce to calculate the avg
   print(key, reduce(lambda x, y: x + y, test_dict[key]) / len(test_dict[key]))

This will print :

NJ 66.0
NY 33.8
CT 220.08

Edit : As per change in OP requirements :

l = list(iter(test_dict.values()))                      #convert values to list
print(l)
#[[20, 50, 70, 90, 100], [10, 3, 0, 99, 57], [90, 1000, 2, 3.4, 5]]
d={}                                                                  #final ditionary
for i in range(len(l[0])): 
   row_list = [row[i] for row in l]                     #get values column-wise
   d['location'+str(i+1)] = sum(row_list)/len(row_list)               #calculate avg

print(d)
#{'location1': 40.0, 'location2': 351.0, 'location3': 24.0, 'location4': 64.13333333333334, 'location5': 54.0}

Note : the average you have put in question for loaction4 is wrong.

edited Sep 25 '17 at 04:07

answered Sep 23 '17 at 17:50

Kaushik NP

6,733
9
31
60

Thank you - my true desire, as explained better above now, is to get an average for each of the first elements in each value. Please see desired output - appreciate your help. – Z_D Sep 23 '17 at 18:23
1

@Tony , check the edit – Kaushik NP Sep 23 '17 at 18:46
My reservation about this solution, is that it uses an integer index to loop through data and this not as Pythonic as it could be. But it works and it effectively answers the question. – fralau Sep 29 '17 at 20:43

fralau · Answer 4 · 2017-09-24T07:07:34.453

To keep it as simple as possible, I would I suggest:

from statistics import mean

test_dict={'NJ':[20,50,70,90,100],'NY':[10,3,0,99,57],'CT':[90,1000,2,3.4,5]}

# put the data in a list of lists
# (throw away the city names)
l = [seq for seq in test_dict.values()]


# put together 1st values, 2nd values, etc.
r = [mean(i) for i in zip(*l)]
print(r)

Which gives:

[40, 351, 24, 64.13333333333334, 54]

I divided to conquer: I turned this dictionary into a list of lists, and then used zip to put the "columns" together. Since zip expects arguments separated by a comma and not a list, I used the star operator (*) to do the conversion.

I am not sure were one should get the list of places from? Is it just Location_ + the index no? (If yes, why not leaving it in a list?)

For the mean function, see the statistics package (for Python > 3.4). Otherwise you can write your own:

mean = lambda l: reduce(lambda x, y: x+y, l) / len(l)

I took inspiration from Finding the average of a list. That is perhaps a a little cryptic and it might have been clearer to write a function without reduce, but a one-liner makes it much easier to copy and paste.

If you are in Python 3, import reduce from functools.

Thank you - my true desire, as explained better above now, is to get an average for each of the first elements in each value. Please see desired output - appreciate your help. — Z_D, Sep 23 '17 at 18:23
Well actually, it was not *that* straightforward (I thought you wanted only the mean of the first column). — fralau, Sep 23 '17 at 19:04

Python - Mean of each value across keys in dict

4 Answers4