I'm currently dealing with dictionary filled by a dozen items which grow until a dozen millions of dictionary's items after few iterations. Fundamentally my item is defined by several ID, value, and characteristics. I create my dict with data in JSON I gather from a SQL server.
The operation I execute are for example:
- get SQL results in JSON
- search item where 'id1' and/or 'id2' which are identical
- merge all items with same 'id1' by summing float('value')
See an example of what looks like my dict:
[
{'id1':'01234-01234-01234',
'value':'10',
'category':'K'}
...
{'id1':'01234-01234-01234',
'value':'5',
'category':'K'}
...
]
What I would like to get looks like:
[
...
{'id1':'01234-01234-01234',
'value':'15',
'category':'K'}
...
]
I could use dict of dicts instead:
{
'01234-01234-01234': {'value':'10',
'categorie':'K'}
...
'01234-01234-01234': {'value':'5',
'categorie':'K'}
...
}
and get:
{'01234-01234-01234': {'value':'15',
'categorie':'K'}
...
}
I've just got dedicated 4Go in Ram and millions of dicts in one dictionary on 64bit architecture I would like to optimise my code and my operations in time and RAM. Are there tricks or better containers than dictionary of dictionaries to realise these kind of operations? Is it better to create a new object which erase the first one for each iteration or change the hashable object itself?
I'm using Python 3.4.
EDIT: simplified the question in one question about the value. The question is similar to How to sum dict elements or Fastest way to merge n-dictionaries and add values on 2.6, but in my case I've string in my dict.
EDIT2: for the moment, the best performances are get thanks to this method:
def merge_similar_dict(input_list=list):
i=0
#sorted the dictionnary of exchanges with the id.
try:
merge_list = sorted(deepcopy(input_list), key=lambda k: k['id'])
while (i+1)<=(len(merge_list)-1):
while (merge_list[i]['id']==merge_list[i+1]['id']):
merge_list[i]['amount'] = str(float(merge_list[i]['amount']) + float(merge_list[i+1]['amount']))
merge_list.remove(merge_list[i+1])
if i+1 >= len(merge_list):
break
else:
pass
i += 1
except Exception as error:
print('The merge of similar dict has failed')
print(error)
raise
return merge_list
return merge_list
When I get dozen thousand dicts in list, it begins to become very long (several minutes).