I have two large text files (200,000+ lines), CSV format. I need to compare them, line by line, but the fields maybe switched within each line.
Example file A.csv
:
AAA,BBB,,DDD
EEE,,GGG,HHH
III,JJJ,KKK,LLL
Example file B.csv
:
AAA,,BBB,DDD
EEE,,GGG,HHH
LLL,KKK,JJJ,III
So for my purposes, A.csv
and B.csv
should be "identical" even though fields are switch in the first and last line. Since the fields in each file might be in a different order, the usual options like grep or diff won't work.
Basically, I think I need to write something that reads a line of A.csv
and B.csv
, and checks if all fields are present in both lines, independent of the order. Alternatively, something that orders the fields after reading the lines.