We have a need for finding formula errors in a large workbook of about 3 million cells. Performance is critical. We have implemented a wrapper function around special cells that allows us to catch an exception in the case of no error cells being found on that worksheet. Similar to:
Handling "No cells were found." Error in Excel
But the problem is the check becomes painfully slow when the COM Exception is thrown. More concretely, when there is even one error on a worksheet, we've recorded execution times of between 1 and 4ms. When the error is thrown we're looking at around 80ms.
Does anyone have any creative fast alternative to checking for the non-existence of errors?
Our code:
try
{
dynamic cellsWithErrors = cells.SpecialCells(cellType, valueType);
return cellsWithErrors;
}
catch (COMException)
{
return null;
}
Notes:
The finding of errors when they exist is fine so no need to focus on that.
I already found a moderate speedup by using dynamic types instead of Range types
Our check iterates through all sheets in the entire workbook. So we'd be happy for any solution that's workbook wide too, but doubt it exists.
We're writing this as part of a VSTO AddIn and using the Interop layer.
Solutions Already Considered
We could read in the entire range as values and iterate through it, but was hoping to see if there is something else. Because in the case where there are errors this will be slower.
Current Idea
So the conundrum boils down to this. On the one hand, SpecialCells is exceptionally fast when it does find results. But the throwing of the exception can lead to a factor of between 20 and 80 of a slowdown. On the other hand, reading the entire used range of a sheet is pretty consistent in terms of performance: I'd say it's linear in the number of cells with a small once off constant downpayment. Now, by inspecting cell values it may be possible to find errors, using Mike Rosenblum's answer here: How to know if a cell has an error in the formula in C#. If so, then we'd have an alternative way to check errors.
The idea then is to write a master function that pivots on the size of the sheet in question. If it's a very large sheet (we have one with around 2 million cells I think), then we might as well just use SpecialCells and accept the cost of an exception if it occurs. But for smaller sheets, where the cost of an exception outweighs the cost of reading in the entire range and iterating through it in memory, then we should just do the latter.
The key thing we'd need to determine for that approach is what the crossing point is: where does SpecialCells begin to outperform read & iterate? We may also want to use our own knowledge of our workbook to heuristically guess which sheets are likely to have errors and which aren't. I'll post an answer if we develop one and I have time.