I have hit a performance wall in my code and have decided to rewrite it, and need some advice on how to tackle this issue. I have a huge list of optical flow data that consists of lists with a frame, X and Y coordinates. like so:
[[[frame,x,y],[frame,x,y]],[[frame,x,y],[frame,x,y]]...]
I have uploaded a sample here: http://pastebin.com/ANUr8bwc
I need to find a way to manage this data so that I can do quick lookups and see what lists contain certain frames.
So far I have looped through all of the data to see what lists contain say frame 34 and 35 and then index them into a new list for reference.
thisFrame = 34
nextFrame = thisFrame + 1
if any(e[0] == thisFrame for e in item) and any(e[0] == nextFrame for e in item): #Check if the item contains this frame and next
DoStuff()
Doing this a few thousand times for a list of 10.000+ points quickly turns into a bottleneck. So my idea was to make a dict for each frame and in that way easily be able to find what items are available on a certain frame:
{frame, [list1,list2,list3]}
But I think I better ask this time. Is there a good goto method for storing and being able to do lookups in big datasets, to avoid looping through all of them every time you need to do so?