1

I am having trouble iterating across an entire dictionary to do simple summary statistics (an average) for each element of a value across keys.

My dictionary consists of keys and values that are lists of numbers:

test_dict={'NJ':[20,50,70,90,100],'NY':[10,3,0,99,57],'CT':[90,1000,2,3.4,5]}

I know that I can access the first value of each key, for instance, by doing the below, but I am having trouble with the obvious next step of adding another for loop to iterate across all elements in the values.

location1=[element[0] for element in test_dict.values()] 
location1_avg=sum(location1)/len(location1)

My ultimate goal is to have a dictionary with labels as keys (Location 1...i) and the average value across states for that location. So the first key-value would be Location1: 40, and so on.

I have the below attempt, but the error message is 'list index out of range' and i do not know how to iterate properly in this case.

for element in test_dict.values():
    avg=list()
    for nums in element[i]:
        avg[i]=sum(element[i][nums])/len(element[i][nums])

Adding desired output per requests

soln_dict={'Location1':40,'Location2':351,'Loction3':24,'Loction4':43.24,'Loction5':54}

Thank you for your help!

Z_D
  • 797
  • 2
  • 12
  • 30
  • 1
    Can you show what exactly you expect to be the result given the `test_dict`? – mkrieger1 Sep 23 '17 at 17:45
  • @Jean-FrançoisFabre I suspect 40 is supposed to be the average of 20, 10, and 90, so the desired result might be a list of 5 numbers, not a dictionary with 3 keys. – mkrieger1 Sep 23 '17 at 17:55
  • You are right - the desired output is a dictionary with five key-value pairs. The first one would be Location 1: 40 – Z_D Sep 23 '17 at 18:19

4 Answers4

1

Not sure where your error lies but the i is a dead giveaway for "using indices where it's not useful / harmful".

Your problem has a straight input/output data stream, and is a perfect match for using dictionary comprehension, iterating on the key, values and rebuilding the dict with the mean as value:

test_dict={'NJ':[20,50,70,90,100],'NY':[10,3,0,99,57],'CT':[90,1000,2,3.4,5]}

result = {k:sum(x)/len(x) for k,x in test_dict.items()}

print(result)

gives:

{'CT': 220.08, 'NJ': 66.0, 'NY': 33.8}

EDIT: you seem to want a "transposed" version with anonymized keys, in that case, just use the zipped version of the values:

result = {"location{}".format(i):sum(v)/len(v) for i,v in enumerate(zip(*test_dict.values()),1)}

gives:

{'location3': 24.0, 'location5': 54.0, 'location1': 40.0, 'location2': 351.0, 'location4': 64.13333333333334}
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
1

You can do this:

test_dict={'NJ':[20,50,70,90,100],'NY':[10,3,0,99,57],'CT':[90,1000,2,3.4,5]}
avg=[sum(element) / len(element) for element in test_dict.values()]
print(avg) # => [66.0, 33.8, 220.08]

And for a dictionary:

test_dict={'NJ':[20,50,70,90,100],'NY':[10,3,0,99,57],'CT':[90,1000,2,3.4,5]}
avg={ k:sum(test_dict[k]) / len(test_dict[k]) for k in test_dict}
print(avg) # => {'NJ': 66.0, 'NY': 33.8, 'CT': 220.08}

Answer to the edited question:

If the arrays always have a length of 5, use this:

test_dict={'NJ':[20,50,70,90,100],'NY':[10,3,0,99,57],'CT':[90,1000,2,3.4,5]}
avg={}
for i in range(5):
  avg['Location'+str(i+1)] = sum(test_dict[k][i] for k in test_dict)/len(test_dict)
print(avg)

Output:

{'Location1': 40.0, 'Location2': 351.0, 'Location3': 24.0, 'Location4': 64.13333333333334, 'Location5': 54.0}
DjaouadNM
  • 22,013
  • 4
  • 33
  • 55
1

Just do :

#loop through the dictionary
for key,value in test_dict.items(): 

   #use reduce to calculate the avg
   print(key, reduce(lambda x, y: x + y, test_dict[key]) / len(test_dict[key]))

This will print :

NJ 66.0
NY 33.8
CT 220.08

Edit : As per change in OP requirements :

l = list(iter(test_dict.values()))                      #convert values to list
print(l)
#[[20, 50, 70, 90, 100], [10, 3, 0, 99, 57], [90, 1000, 2, 3.4, 5]]
d={}                                                                  #final ditionary
for i in range(len(l[0])): 
   row_list = [row[i] for row in l]                     #get values column-wise
   d['location'+str(i+1)] = sum(row_list)/len(row_list)               #calculate avg

print(d)
#{'location1': 40.0, 'location2': 351.0, 'location3': 24.0, 'location4': 64.13333333333334, 'location5': 54.0}

Note : the average you have put in question for loaction4 is wrong.

Kaushik NP
  • 6,733
  • 9
  • 31
  • 60
  • Thank you - my true desire, as explained better above now, is to get an average for each of the first elements in each value. Please see desired output - appreciate your help. – Z_D Sep 23 '17 at 18:23
  • 1
    @Tony , check the edit – Kaushik NP Sep 23 '17 at 18:46
  • My reservation about this solution, is that it uses an integer index to loop through data and this not as Pythonic as it could be. But it works and it effectively answers the question. – fralau Sep 29 '17 at 20:43
1

To keep it as simple as possible, I would I suggest:

from statistics import mean

test_dict={'NJ':[20,50,70,90,100],'NY':[10,3,0,99,57],'CT':[90,1000,2,3.4,5]}

# put the data in a list of lists
# (throw away the city names)
l = [seq for seq in test_dict.values()]


# put together 1st values, 2nd values, etc.
r = [mean(i) for i in zip(*l)]
print(r)

Which gives:

[40, 351, 24, 64.13333333333334, 54]

I divided to conquer: I turned this dictionary into a list of lists, and then used zip to put the "columns" together. Since zip expects arguments separated by a comma and not a list, I used the star operator (*) to do the conversion.

I am not sure were one should get the list of places from? Is it just Location_ + the index no? (If yes, why not leaving it in a list?)

For the mean function, see the statistics package (for Python > 3.4). Otherwise you can write your own:

mean = lambda l: reduce(lambda x, y: x+y, l) / len(l)

I took inspiration from Finding the average of a list. That is perhaps a a little cryptic and it might have been clearer to write a function without reduce, but a one-liner makes it much easier to copy and paste.

If you are in Python 3, import reduce from functools.

fralau
  • 3,279
  • 3
  • 28
  • 41
  • Thank you - my true desire, as explained better above now, is to get an average for each of the first elements in each value. Please see desired output - appreciate your help. – Z_D Sep 23 '17 at 18:23
  • Ah OK. That's even easier. I'll modify my answer. – fralau Sep 23 '17 at 18:25
  • Well actually, it was not *that* straightforward (I thought you wanted only the mean of the first column). – fralau Sep 23 '17 at 19:04