I have an application that calls a DLL, which in turn may make calls to another DLL.
My problem is performance degradation when these binaries are 64-bit vs. 32-bit.
I have been profiling (AQtime v8.24) using Elapsed Time and CPU Cache Misses counters, and I do not understand the results in a way that helps me know what to do.
So I wrote a test .exe that calls a test DLL, simplifying the code. Initially, the performance degradation existed for these tools (64-bit versions were four times slower than 32-bit), and the CPU Cache Misses test pointed to this routine:
const char* TSimple::get_schema_name( const int schema_number )
{
char* t_ptr = 0;
int t_idx;
for (t_idx = 0; t_idx < 153; t_idx++)
{
// THIS ASSIGNMENT IS WHAT WAS SHOWN TO BE A PROBLEM
bogus_SchemaDef t_def = schema_list[t_idx];
if (t_def.SchemaNumber == schema_number)
{
return (const char*)schema_list[t_idx].SchemaName;
break;
}
}
return t_ptr;
}
// THIS IS THE bogus_SchemaDef struct:
typedef struct
{
int SchemaNumber;
char SchemaName[100];
char SiteList[100];
} bogus_SchemaDef;
// THIS IS THE schema_list ARRAY (portion):
static bogus_SchemaDef schema_list[] = {
{ 1, "LipUpper", "C000;C003" },
{ 153, "IllDefinedOther", "C420-C424;C760-C765;C767-C768;C770-C775;C778-C779;C809" }
};
So I changed the code to this (eliminated the assignment to an instance of the struct):
const char* TSimple::get_schema_name( const int schema_number )
{
char* t_ptr = 0;
int t_idx;
for (t_idx = 0; t_idx < 153; t_idx++)
{
//bogus_SchemaDef t_def = schema_list[t_idx];
//if (t_def.SchemaNumber == schema_number)
if (schema_list[t_idx].SchemaNumber == schema_number)
{
return (const char*)schema_list[t_idx].SchemaName;
break;
}
}
return t_ptr;
}
Re-ran the tests, and this time the 64-bit version was 36% faster than 32-bit. Great! Although I don't understand WHY this change made such a difference.
But according to AQtime, the 64-bit version still performs worse than the 32-bit version.
CPU Cache Misses/% Misses
32-bit: 25.79%
64-bit: 83.34%
Elapsed Time/% Time
32-bit: 10.99%
64-bit: 33.95%
I really need to understand what AQtime is telling me, because when I plug this revised test DLL into the environment where my app calls my DLL which then calls this DLL, the overall performance degrades by 30-40%.
I should note that when I test my app+DLL where I am not making the call into the second DLL, the 64-bit builds run as fast or faster than the 32-bit builds. Everything points to this call to any second DLL.
I am overwhelmed by chasing through documentation... confusing myself... and ultimately guessing at code changes that may or may not make any difference.
Hoping for guidance.