I'm working on a massive number crunching project. I've been optimising everything since the start as I knew it would be important. Doing performance analysis my code spends almost 40% of it's life in one function - the binary tree iterator.
public ScTreeNode GetNodeForState(int rootIndex, float[] inputs)
{
0.2% ScTreeNode node = RootNodes[rootIndex].TreeNode;
24.6% while (node.BranchData != null)
{
0.2% BranchNodeData b = node.BranchData;
0.5% node = b.Child2;
12.8% if (inputs[b.SplitInputIndex] <= b.SplitValue)
0.8% node = b.Child1;
}
0.4% return node;
}
Do any C# optimisation experts have any tips for optimising this further? All comparisons are floats. I know that in theory it shouldn't matter, but I'm using fields rather than properties so ensure optimisation. A small saving here could shave days off the process.
Please no replies saying "These optimisations don't matter in the real world" - because in this instance they do. :-)
Edit: I've updated the code to what I've got now following the comments below, and added in the performance analysis output for each line of code. As you can see, the main killer is the null check - why? I tried using a boolean flag IsLeaf on the node instead of the null check, but it was a equal performance hit on that line.
The code for branch node object is as follows:
public sealed class BranchNodeData
{
/// <summary>
/// The index of the data item in the input array on which we need to split
/// </summary>
internal int SplitInputIndex = 0;
/// <summary>
/// The value that we should split on
/// </summary>
internal float SplitValue = 0;
/// <summary>
/// The nodes children
/// </summary>
internal ScTreeNode Child1;
internal ScTreeNode Child2;
}
Another Edit: Yet more thinking here... I was wondering why the line
BranchNodeData b = node.BranchData;
was registering 0.2% of execution and the null comparison line was registering 17.7%. I'm guessing this is a branch prediction fail? While that comparison being hit multiple times, and almost always returning true, it makes it very hard for CPU to predict when it's going to return false. I'm not very clued up on the low level workings of a CPU, but is this likely be the case?