In Java, I have a method that reads two files, with each line being a GUID. The lines are unordered. The output is two new files with the lines that appear only on each file.
Example files:
| Input_1 | Input_2 | | Output_1 | Output_2 |
| ------- | ------- | | -------- | -------- |
| abcdef | uvwxyz | > | mnopqr | uvwxyz |
| ghijkl | ghijkl |
| mnopqr | abcdef |
I managed to do it fine with one Collection<String>
for each file and some addAll()
+ removeAll()
shenanigans, however the files are growing in size and this whole thing is taking some time now. Each file has about 600k lines.
Is there a fast way to improve this code just using another type of collection or I need to refactor my way of doing?
Code in question:
//Read two files
Collection<String> guidFile1 = readFileGuid(pathFile1);
Collection<String> guidFile2 = readFileGuid(pathFile2);
//Add file1 and remove file2
Collection<String> leftFromFile1 = new ArrayList<String>();
leftFromFile1.addAll(guidFile1);
leftFromFile1.removeAll(guidFile2);
//Add file2 and remove file1
Collection<String> leftFromFile2 = new ArrayList<String>();
leftFromFile2.addAll(guidFile2);
leftFromFile2.removeAll(guidFile1);
//Outputs
System.out.println("Leftover from file1: " + leftFromFile1.size());
System.out.println("Leftover from file2: " + leftFromFile2.size());