I have a HashSet
containing custom objects generated from reading a binary file. I also have a dictionary generated from reading each row of a DBF file. There's an index property on both that line up with each other. For example, the 10th item in my Dictionary will line up with the 10th item in my HashSet
.
I am comparing LARGE amounts of data against each other. There can be anywhere from 10,000 records to 500,000. The application checks the other two files (one binary, the other is a dbf) for differences. It checks the HashCode
of the object (which is generated by certain properties, it does this comparison fast and easy)
Here is how I build each individual dictionary (there is a similar one for mod as well):
foreach (DataRow row in origDbfFile.datatable.Rows)
{
string str = "";
foreach (String columnName in columnNames)
{
str += "~" + row.Field<Object>(columnName);
}
origDRdict.Add(d, str);
d++;
}
The columns between the two files will always be the same. However I can run into two different files with different columns. I essentially output all data into a string for dictionary lookup. I only want to hit the DBF file again if the data is different.
Here is my code for DB lookup. This will find differences, it's just really slow when it runs the ELSE section of my (!foundIt)
if
block. If I remove it, it only takes one minute to list all not found items.
foreach (CustomClass customclass in origCustomClassList) {
Boolean foundIt = false;
if (modCustomClassList.Contains(customclass))
{
foundIt = true;
}
//at this point, an element has not been found
if (!foundIt)
{
notFoundRecords.Add(customclass);
}
//If I remove this entire else block, code runs fast.
else //at this point an element has been found
{
//
//check 'modified' dictionary array
if (!(modDRdict.ContainsValue(origDRdict[i])))
{
//at this point, the coordinates are the same,
//however there are DB changes
//this is where I would do a full check based on indexes
//to show changes.
}
}
i++; //since hashsets can't be indexed, we need to increment
}
What I've tried / Other Thoughts
Generating a
HashSet
of custom objects, custom object having an index of an integer, and string being the length of columns and valuesRemoving
if (!(modDRdict.ContainsValue(origDRdict[i])))
block makes code significantly quicker. Time to iterate removed records between two 440,000 record files only takes one minute. The dictionary lookup is taking forever!I don't think the
foreach
loop within theforeach
loop is causing too much overhead. If I keep it in the code, but don't do a lookup then it still runs quick.