Here is some code that could work. It relies on the first line of each file containing column headers.
It's a bit more than a tweak, though. It's an "old dog" approach.
The original code in the question has these lines:
Set<String> source = new HashSet<>(org.apache.commons.io.FileUtils.readLines(new File(sourceFile)));
Set<String> target = new HashSet<>(org.apache.commons.io.FileUtils.readLines(new File(targetFile)));
With this solution, the data coming in needs more processing before it will be ready to be put into a Set
. Those two lines get changed as follows:
List<String> source = (org.apache.commons.io.FileUtils.readLines(new File(sourceFile)));
List<String> target = (org.apache.commons.io.FileUtils.readLines(new File(targetFile)));
This approach will compare column headers in the target file and the source file. It will use that to build an int []
that indicates the difference in column order.
After the order difference array is filled, the data in the file will be put into a pair of Set<List<String>>
. Each List<String>
will represent one line from the source and target data files. Each String
in the List
will be data from one column.
In the following code, main
is the test driver. Only for my testing purposes, the data files have been replaced by a pair of String []
and reading the file with org.apache.commons.io.FileUtils.readLines
has been replaced with Arrays.asList
.
package comparecsv;
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class CompareCSV {
private static int [] columnReorder;
private static void headersOrder
(String sourceHeader, String targetHeader) {
String [] columnHeader = sourceHeader.split (",");
List<String> sourceColumn = Arrays.asList (columnHeader);
columnReorder = new int [columnHeader.length];
String [] targetColumn = targetHeader.split (",");
for (int i = 0; i < targetColumn.length; ++i) {
int j = sourceColumn.indexOf(targetColumn[i]);
columnReorder [i] = j;
}
}
private static Set<List<String>> toSet
(List<String> data, boolean reorder) {
Set<List<String>> dataSet = new HashSet<> ();
for (String s: data) {
String [] byColumn = s.split (",");
if (reorder) {
String [] reordered = new String [byColumn.length];
for (int i = 0; i < byColumn.length; ++i) {
reordered[columnReorder[i]] = byColumn [i];
}
dataSet.add (Arrays.asList (reordered));
} else {
dataSet.add (Arrays.asList(byColumn));
}
}
return dataSet;
}
public static void main(String[] args) {
String [] sourceData = {"a,b,c,d,e", "1,2,3,4,5", "6,7,8,9,10"
,"11,12,13,14,15", "16,17,18,19,20"};
String [] targetData = {"c,b,e,d,a", "3,2,5,4,1", "8,7,10,9,6"
,"13,12,15,14,11", "18,17,20,19,16"};
List<String> source = Arrays.asList(sourceData);
List<String> target = Arrays.asList (targetData);
headersOrder (source.get(0), target.get(0));
Set<List<String>> sourceSet = toSet (source, false);
Set<List<String>> targetSet = toSet (target, true);
System.out.println ( sourceSet.containsAll (targetSet)
+ " " + targetSet.containsAll (sourceSet) + " " +
( sourceSet.containsAll (targetSet)
&& targetSet.containsAll (sourceSet)));
}
}
MethodheadersOrder
compares the headers, column by column, and populates the columnReorder
array. Method toSet
creates the Set<List<String>>
, either reordering the columns or not, according to the value of the boolean
argument.
For the sake of simplification, this assumes lines are easily split using comma. Data such as dog, "Reginald, III", 3
will cause failure.
In testing this, I found lines in the file can be matched with their counterpart in the other file, regardless of order of the lines. Here is an example:
Source:
a,b,c
1,2,3
4,5,6
7,8,9
Target:
a,b,c
4,5,6
7,8,9
1,2,3
The result would be the contents match.
I believe this would match a result from the O/P question code. However, for this solution to work, the first line in each file must contain column headers.