My task is to load new set of data (which is written in XML file) and then compare it to the 'old' set (also in XML). All the changes are written to another file.
My program loads new and old file into two datasets, then row after row I compare primary key from the new set with the old one. When I find corresponding row, I check all fields and if there are differences with the old one, I write it to third set and then this set to a file.
Right now I use:
newDS.ReadXml("data.xml");
oldDS.ReadXml("old.xml");
and then I just find rows with corresponding primary key and compare other fields. It is working quite good for small files.
The problem is that my files may have up to about 4GB. If my new and old data are that big it is quite problematic to load 8GB of data to memory.
I would like to load my data in parts, but to compare I need whole old data (or how to get specific row with corresponding primary key from XML file?).
Another problem is that I don't know the structure of a XML file. It is defined by user.
What is the best way to work with such a big files? I thought about using LINQ to XML, but I don't know if it has options that can help with my problem. Maybe it would be better to leave XML and use something different?