0

I'm stumbled against a simple problem that I cannot figure out.
I have a list of objects saved in a json file. Once converted into Python they give me something like this:

data = [
    {"id": 1, "value": 3},
    {"id": 2, "value": 1},
    {"id": 3, "value": 5},
    {"id": 1, "value": 1},
    {"id": 1, "value": 2},
    {"id": 3, "value": 2},
    {"id": 1, "value": 3}
]

I'm trying to have only one object / dictionary per unique 'id'. In other words I'm trying to obtain the following result:

[{"id": 1, "value": 9}, {"id": 2, "value": 1}, {"id": 3, "value": 7}]

Obviously, I could do it the long way around:

foo = list(set([item["id"] for item in data]))

newList = []
for i in foo:
    bar = {"id": i, "value": 0}
    for x in data:
        if x['id'] == i:
            bar['value'] += x['value']

    newList.append(bar)

However I'm worried about this triple-nested loop which might slow down the process considerably with large datasets.

I'm familiar with sum(d.values()) and I've seen the question on collection's Counters, but these do not work in my case.

Any ideas how the desired result can be obtained ?

Nootaku
  • 217
  • 3
  • 14

1 Answers1

1

Pandas to the rescue!

First, create a dataframe from your data

df = pd.DataFrame(data)
print(df)
Output:
   id  value
0   1      3
1   2      1
2   3      5
3   1      1
4   1      2
5   3      2
6   1      3

Then, group by id and sum

result = df.groupby("id").sum()
print(result)
Output:
    value
id       
1       9
2       1
3       7

And then convert this to the list like you want:

new_list = [{"id": elemid, "value": val.value} for elemid, val in result.iterrows()]
print(new_list)
# Output: [{'id': 1, 'value': 9}, {'id': 2, 'value': 1}, {'id': 3, 'value': 7}]

Alternatively, you could just improve your existing way:

data = [
    {"id": 1, "value": 3},
    {"id": 2, "value": 1},
    {"id": 3, "value": 5},
    {"id": 1, "value": 1},
    {"id": 1, "value": 2},
    {"id": 3, "value": 2},
    {"id": 1, "value": 3}
]

new_data = {}
for elem in data:
    elemid = elem["id"]
    value = elem["value"]
    new_data[elemid] = new_data.get(elemid, 0) + value

print(new_data)
# Output: {1: 9, 2: 1, 3: 7}

Then, convert this to the list like you want:

new_list = [{"id": key, "value": value} for key, value in new_data.items()]
print(new_list)
# Output: [{'id': 1, 'value': 9}, {'id': 2, 'value': 1}, {'id': 3, 'value': 7}]
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
  • 1
    pity it's closed for answering. here 's a one-liner: `sums = [{"id": i, "value": sum([d['value'] for d in data if d['id']==1])} for i in set([d['id'] for d in data])]` – Peter M. Sep 30 '20 at 16:59
  • @PeterM. you probably mean `d['id'] == i` instead of `d['id'] == 1`, but that's a worse solution than mine anyway because it's O(n^2). It's essentially the same as OP's – Pranav Hosangadi Sep 30 '20 at 17:02
  • Yes Pranav. 1 should be i. – Peter M. Sep 30 '20 at 17:41