I have a program that creates multiple text files of rdf triples. I need to compare the triples and do it fast, what is the best way to do this? I thought of putting the triples into an array and comparing them but there could potentially be hundreds of thousands of triples per file and that would take forever. I need it to be as close to realtime as possible since the triples will be genreated constantly amoung the files. Any help would be great. The files are also in AllegroGraph repository's if it's easier to compare them there somehow.
A thought: if I stored the triples in excel (one triple per row) and one sheet per repository,
A: how could I find the duplicates amoung the sheets. B: would it be fast. and C: how could I automate that from Java?