I need to code this task in java. I have 2 large files around 5GB each containing text data of multiple rows. Each row is a line of comma separated fields, for example "name,empId,designation,address,...,so on up to 30 fields". I need to read these 2 files and write the records to another file with additional field which specifies the given data row is Changed, Not Changed, Added or Deleted. For example
File1
Tom,E100,Engineer
Rick,E200,Engineer
File2
Tom,E100,Manager
Paul,E300,Clerk
ResultFile
Tom,E100,Manager,Changed
Paul,E300,Clerk,Added
Rick,E200,Engineer,Deleted
Approach I used is to create a map from the data of file1 using empId as the key and entire data row as value (assuming empId is unique) and then read each record from file2 to check against the data in the map (I am not reading entire content of file2 into memory, but only file1 to create the map). I am using BufferedReader/BufferedWriter for reading and writing.
This approach works fine but only for small data file. Given data files that runs into GBs my program runs out of memory very soon while trying to create the map.
What would be the right approach to achieve this task both in terms of memory and speed of execution?
Thanks, LX