csv file comparison using java efficiently

Question

Can someone help me with any approach for the below scenario? I have to compare to csv files which are basically the select queries(having selective field's values,i.e not a select * from ..) of two or more table.

select student_id,student_name,age,address,dept id,class_id 
from student
select deptid,dept_name,head_of_dept,number_of_staffs 
from department

I have to compare these two files and display the field names and its value which is differing from the other file.The files do not have any unique key values, as it is the result of select query of multiple tables in a row of the files it can have duplicate values , i,e for e.g if it result of student and dept table it can have dept id twice in the row of a file. How do I compare this case using java.

Look at this one https://stackoverflow.com/questions/10864654/comparing-two-csv-files-in-java Btw what OS are you using? Why dont you do the diff in your SQL and return back? — SMA, Nov 12 '17 at 05:56
yes I have gone through the link before. There we have a primary key, but in my scenario i do not have a key as such to compare.The files i am comparing for too huge so i can not use sql .My OS is windows 10. — Priyanka GP, Nov 12 '17 at 07:20
I have a table which says what is the csv file content, i.e which field position in the csv file corresponds to what table name and which column name. I may have to refer to this table in my code to get the info about the file's structure. So in the csv file, like first 20 positions could be from one table and the rest could be from the second table. — Priyanka GP, Nov 12 '17 at 07:24
How are you creating CSV? Its through SQL? Could you share SQL, table structure so we know more ? — SMA, Nov 12 '17 at 07:42
I have around 20 tables like this. I do not have the query which will generate the csv file, I only have the files to compare. I will give you the context. Our client has decided to move their database from db2 to Hive, the migration has been done by a diff team, and we have to now compare the files (db2 vs Hive). There is a portal where user can go select few columns of table which he/she wants to check if the values are same in db2 and hive and we have those output files to compare as a result. — Priyanka GP, Nov 12 '17 at 09:48
They can select multiple tables like that. So each file we get to compare will be having data from among these 20 tables, to refer to the column name and the data in the file is of which from which we have a meta data table to refer. — Priyanka GP, Nov 12 '17 at 09:53
The meta data table i am speaking about has field like template id, field_position_number,outpt_custm_fleid_name which has the mapping for the csv file to the column names in a table. — Priyanka GP, Nov 12 '17 at 10:00

csv file comparison using java efficiently

0 Answers0