I am doing speed optimization on my own code. Pseudocode is something like this:
1. myStructure = constructStructure (param1)
2. processStructure(myStructure)
3. myStructure = constructStructure (param2)
4. processStructure(myStructure)
I was focusing on optimizing the function constructStructure(param)
since the two calls to the function take up about 75% of the time. I did not touch the function processStructure(structure)
.
I did get a nice speed up (the whole thing is now taking about 75% of the original time), but when I measured respective times of operations 1.-4.
I got unexpected results:
before after
1. 75.10ms 56.88ms
2. 23.12ms 19.32ms
3. 70.72ms 53.22ms
4. 20.81ms 14.45ms
I got a (small, but) significant speedup in the parts 2.
and 4.
that were not changed! This is measured on a 1000
runs and then averaged. (I did not calculate the standard deviation, but I did run and display individual times 20-ish times for each variant, and it seems consistent with the averages).
The structures produced before and after optimization are identical as far as I can tell, and the final output of the program is the same in both cases.
There is no (substantial) memory leaks as far as I can tell -- I was monitoring my systems memory during the test runs and there was no consistent increase in the used memory. (I know this is not a very good way to test that, and digging into potential memory leakage is my next stop. Plus, I do make a point of freeing/deleting all the reserved memory as soon as I don't need it).
Not that I'm not happy about the speed up, but I can't even begin to understand what has happened! As it is the code I will be using for quite a while after I finish working on it, having a mystery in the code is not an appealing option.
Does anybody have any ideas as to what happened and how I can verify if what you are suggesting is really the case? PS. I'm working in C++.
As the actual code is too big to put here (100 lines for producing the structure + 300 lines to process it), I can describe it a little bit, as well as describing the change:
constructStructure
It is a function building a tree structure (not a binary tree) (based on grayscale image pixels), where every node is assigned an int
attribute.
The parameter the constructStructure
function is a Comparator
indicating the ordering of pixel intensities (first time is less<int>()
, second time greater<int>()
).
The only change in the optimized procedure is using std::priority_queue
instead of std::multimap
to implement a heap structure (as described in this question of mine -- except it is not used with std::pair<int,int>
but instead on std::pair<int,myData>
)
I have verified that the produced myStructure
is an equivalent tree (by examining the produced tree for 10 different images) in a sense:
- the produced tree has the same number of nodes
- the data contained in the nodes is the same
- but the order of children within a node is different when using
std::multimap
than when usingstd::priority_queue
(the children are again the nodes holding the same data) - conclusion: the trees are equivalent as to the data it holds and its structure up to the order of the child nodes within any single parent node
processStructure
It is a function that examines the constructed tree, in a DFS (bottom-up) fashion. The complexity depends only on the number of nodes in the tree and examines only the attribute assigned to each node (not the data contained in the node, which is available for further processing not used here).
Finally, after further testing and the differences between the nodes order I pointed out, the question would be: is it possible that the order of the nodes in the tree produces this significant change in performance, even if (when traversing the tree with a DFS approach) the data within the nodes is not examined but only a single integer attribute for each node?