I have a (normal, unordered) dictionary that is holding my data and I extract some of the data into a numpy array to do some linear algebra. Once that's done I want to put the resulting ordered numpy vector data back into the dictionary with all of data. What's the best, most Pythonic, way to do this?
Joe Kington suggests in his answer to "Writing to numpy array from dictionary" that two solutions include:
- Using Ordered Dictionaries
- Storing the sorting order in another data structure, such as a dictionary
Here are some (possibly useful) details:
My data is in nested dictionaries. The outer is for groups: {groupKey: groupDict} and group keys start at 0 and count up in order to the total number of groups. groupDict contains information about items: (itemKey: itemDict}. itemDict has keys for the actual data and these keys typically start at 0, but can skip numbers as not all "item locations" are populated. itemDict keys include things like 'name', 'description', 'x', 'y', ...
Getting to the data is easy, dictionaries are great: data[groupKey][itemKey]['x'] = 0.12
Then I put data such as x and y into a numpy vectors and arrays, something like this:
xVector = numpy.empty( xLength )
vectorIndex = 0
for groupKey, groupDict in dataDict.items()
for itemKey, itemDict in groupDict.items()
xVector[vectorIndex] = itemDict['x']
vectorIndex += 1
Then I go off and do my linear algebra and calculate a z vector that I want to add back into dataDict. The issue is that dataDict is unordered, so I don't have any way of getting the proper index.
The Ordered Dict method would allow me to know the order and then index through the dataDict structure and put the data back in.
Alternatively, I could create another dictionary while inside the inner for loop above that stores the relationship between vectorIndex, groupKey and itemKey:
sortingDict[vectorIndex]['groupKey'] = groupKey
sortingDict[vectorIndex]['itemKey'] = itemKey
Later, when it's time to put the data back, I could just loop through the vectors and add the data:
vectorIndex = 0
for z in numpy.nditer(zVector):
dataDict[sortingDict[vectorIndex]['groupKey']][sortingDict[vectorIndex]['itemKey']]['z'] = z
Both methods seem equally straight forward to me. I'm not sure if changing dataDict to an ordered dictionary will have any other effects elsewhere in my code, but probably not. Adding the sorting dictionary also seems pretty easy as it will get created at the same time as the numpy arrays and vectors. Left on my own I think I would go with the sortingDict method.
Is one of these methods better than the others? Is there a better way I'm not thinking of? My data structure works well for me, but if there's a way to change that to improve everything else I'm open to it.