I have two large XML files (3GB, 80000 records). One is updated version of another. I want to identify which records changed (were added/updated/deleted). There are some timestamps in the files, but I am not sure they can be trusted. Same with order of records within the files.
The files are too large to load into memory as XML (even one, never mind both).
The way I was thinking about it is to do some sort of parsing/indexing of content offset within the first file on record-level with in-memory map of IDs, then stream the second file and use random-access to compare those records that exist in both. This would probably take 2 or 3 passes but that's fine. But I cannot find easy library/approach that would let me do it. vtd-xml with VTDNavHuge looks interesting, but I cannot understand (from documentation) whether it supports random-access revisiting and loading of records based on pre-saved locations.
Java library/solution is preferred, but C# is acceptable too.