0

I have a long list values like below and am thinking of converting this to a dictionary. I would like to create dictionaries in python where for each key 0, 1, 2,...(say n), the value is sum of all values that belong to that key. For example,

0 -29.8568331501
1 -27.4866699437
2 -27.1228643508
0 -10.8685684486
1 -9.41353774283
2 -10.3218886291
...

Then,

dict = {0: SUM(-29.8568331501+-10.8685684486+..), 1:SUM(-27.4866699437+-9.41353774283+..), 2:1:SUM(-27.1228643508+-10.3218886291+..)}

I am a python newbie and would appreciate any guidance on how to go about doing this.

Mazdak
  • 105,000
  • 18
  • 159
  • 188
Abi
  • 1

3 Answers3

2

So assuming you have the data read into a list already (or are using a generator) you can indeed do this in a very simple way:

from collections import defaultdict

sums = defaultdict(float)
for key, val in data:
    sums[key] += val

I advise you to familiarize yourself with the collections module, because it has a lot of right tools for many jobs. In this case defaultdict is just like a normal python dictionary but it has a default value for keys that are not yet in it (in this case the default value is the value of float() which is 0.0). Thanks to this, you do not need to bother with checking if the key already exists in the dict.

jepio
  • 2,276
  • 1
  • 11
  • 16
1

Ok, two things:

  1. if your keys are actually 1...N, then don't use a dictionary but a simple list.
  2. solution with a dict, that I, for clarity, called dictionary. The 1...N are the keys, and your floating point numbers are the values:

    if key is in dictionary: dictionary[key] += value else: dictionary[key] = value

You need to apply that to each key, value pair.

Marcus Müller
  • 34,677
  • 4
  • 53
  • 94
1

You can use a dictionary comprehension as outlined in:

Mapping Over Values

tmpDict = {k: f(k,v) for k, v in tmp.items()}

Replacing f(k,v) with a function that performs the sums.

Example:

Generate data

tmp = {}

for i in range(0, 10):
    tmp [i] = [i for x in range(0, 10)]

{0: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 1: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 2: [2, 2, 2, 2, 2, 2, 2, 2, 2, 2], 3: [3, 3, 3, 3, 3, 3, 3, 3, 3, 3], 4: [4, 4, 4, 4, 4, 4, 4, 4, 4, 4], 5: [5, 5, 5, 5, 5, 5, 5, 5, 5, 5], 6: [6, 6, 6, 6, 6, 6, 6, 6, 6, 6], 7: [7, 7, 7, 7, 7, 7, 7, 7, 7, 7], 8: [8, 8, 8, 8, 8, 8, 8, 8, 8, 8], 9: [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]}

Perform bucketed sum(s)

tmpDict = {k: sum(v) for k, v in tmp.items()}

Result Set:

{0: 0, 1: 10, 2: 20, 3: 30, 4: 40, 5: 50, 6: 60, 7: 70, 8: 80, 9: 90}

Community
  • 1
  • 1
dasm80x86
  • 363
  • 2
  • 7
  • How does this help? `tmp` would have to be a dict (of what?) and the function `f` would need to have global state to keep track of the sums, which effectively removes the need for the dictionary comprehension. It seem like 1 problem has been replaced by 3. – jepio Feb 04 '15 at 19:37
  • @jepio the function `f` was intended to be a placeholder for the `sum` operation. – dasm80x86 Feb 13 '15 at 19:36
  • I agree that if you have the data in the same format as your `tmp` dictionary, than a dict comprehension with sum is the easiest solution. But first you need to translate the data to this format. For the data in the question I came up with the following: `data.sort(key=operator.itemgetter(0)); b = {k: sum(v for _, v in vals) for k, vals in itertools.groupby(data, itemgetter(0))}` I would not say this is a particularly easy solution... – jepio Feb 13 '15 at 19:51
  • That's a fair assessment; stored as a dictionary that the dict comprehension is the way to go. I don't see how else you store the aforementioned data. (e.g.) `{1:1,1:2,1:3}` yields `{1:3}` -- I'll ask the OP. – dasm80x86 Feb 13 '15 at 21:20