I'm very confused by a strange issue I'm having.
Consider the following code:
testTimer.Start();
u64 startPositionInBytes = startPosition * sizeof(DBTransaction);
u64 chunkSizeInBytes = chunkSize * sizeof(DBTransaction);
u64 viewOffset = (startPositionInBytes / allocGranularity) * allocGranularity;
u64 mapViewSize = (startPositionInBytes % allocGranularity) + chunkSizeInBytes;
if (mapViewSize > (queryThread->FileSize - viewOffset))
mapViewSize = (queryThread->FileSize - viewOffset);
u64 dataReadOffset = startPositionInBytes - viewOffset;
DWORD high = (DWORD)(viewOffset >> 32);
DWORD low = (DWORD)(viewOffset & 0xFFFFFFFF);
u8* fileData = (u8*)MapViewOfFile(queryThread->FileMap, FILE_MAP_READ, high, low, mapViewSize);
if (fileData == nullptr)
{
RALogError("Unable to create a file view into mapped file at position %llu (chunk size %llu). Error was: %s. Application can not proceed", startPosition, chunkSize, GetLastSystemError().c_str());
}
elapsed = testTimer.Measure();
queryThread->MapGlobalTime += elapsed;
DBTransaction* transaction = (DBTransaction*)(fileData + dataReadOffset);
for (u32 i = 0; i < chunkSize; ++i)
{
testTimer.Start();
//DBTransaction* transaction = (DBTransaction*)pBytes;
queryThread->TotalRead++;
elapsed = testTimer.Measure();
queryThread->CopyTime += elapsed;
testTimer.Start();
bool allConditionsPassed = true;
for (u32 j = 0; j < QueryFilters.size(); ++j)
{
RAQueryFilter* filter = &QueryFilters[j];
bool atLeastOneTrue = false;
for (u32 k = 0; k < filter->Conditions.size(); ++k)
{
QueryCondition* condition = &filter->Conditions[k];
if (condition->Compare(transaction, condition))
atLeastOneTrue = true;
if (atLeastOneTrue)
break;
}
if (atLeastOneTrue == false)
{
allConditionsPassed = false;
break;
}
}
elapsed = testTimer.Measure();
queryThread->MatchGlobalTime += elapsed;
testTimer.Start();
if (allConditionsPassed)
{
queryPointer->AddToResult(queryThread, queryPointer, transaction);
queryThread->TotalMatches++;
}
elapsed = testTimer.Measure();
queryThread->ComputeGlobalTime += elapsed;
transaction ++;
}
This is a thread that reads a chunk of file mapped on memory. There are 10 of them reading from different chunks of the same file. If the file is small (2 GB) this code can read the whole thing and find all matches in almost no time.
When the size of the file increases the time required increases as well (which is expected), but it increases exponentially and not linearly as I would expect. Working with a 12GB file (20+ million transactions) the whole process takes 26 seconds which is insane.
I decided to profile the code in the thread and here is my confusion:
MapViewOfFile
takes a bunch of milliseconds so it´s not the bottleneck. The time is almost entirely spent in the function Compare
, but if I comment out this function (all transactions match), the time is exactly the same but it´s spent in AddToResult
.
If I also comment out this function and copy the transaction from the buffer into a local variable (and then I add a random counter used elsewhere to avoid the compiler optimizing out the whole code), the time is yet again the same but is spent in the copy.
My guess is that the operation taking the time is actually the dereferencing of the pointer to DBtransaction
, whenever it happens first.
I'm not sure about what is going on. Can anyone explain, and provide some suggestions for how to fix the issue?
Here is the code that spawn the threads:
u32 requiredThreads = AvailableThreadsCount;
u64 transactionsPerThread = (u64)ceil((f32)dataToRead / (f32)requiredThreads);
// Make sure to not make thread load too small. If the number of transactions is less than 500K, reduce the number of used threads
if (transactionsPerThread < RA_MINIMUM_THREAD_TRANSACTION_COUNT)
{
requiredThreads = (u32)ceil((f32)dataToRead / (f32)RA_MINIMUM_THREAD_TRANSACTION_COUNT);
transactionsPerThread = RA_MINIMUM_THREAD_TRANSACTION_COUNT;
}
// Spawn the threads
u32 i = 0;
while (dataToRead > 0)
{
u64 chunkSize = dataToRead;
if (chunkSize > transactionsPerThread)
chunkSize = transactionsPerThread;
RA_ASSERT(i < requiredThreads, "Too many threads created!");
sQueryThread* queryThread = &QueryThread[i];
#ifdef USE_FILE_MAPPING
queryThread->Set(startPosition, chunkSize, testReadChunk, TransactionsHistory->FileSizeInBytes, TransactionsHistory->ReadFileMapHandle);
#else
queryThread->Set(startPosition, chunkSize, 0, NULL);
#endif
//RALog("Spawning thread %i", i);
queryThread->Handle = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)DoLinearSearchQueryThread, queryThread, 0, &queryThread->ThreadID);
i++;
startPosition += chunkSize;
dataToRead -= chunkSize;
}
I basically split the file in chunks and assign one of them to each thread