Goal: Compare two excel files, each excel is having only one column but having half million rows. I just want to see the un-common values between two excels.
Column: only 1 Rows: 500,000
Language: JAVA
What I tried so far:
- Apache POI used (please see below program-1)
- ArrayList (please see below program-2)
- I tried some websites where we can upload the multiples Excel to see the differences (https://www.textcompare.org/excel/)
- I increased heap size to 256 to 2048m, but not working for me.
(My programs are working good for small set of data or few records)
above efforts did not work for me.
program-1::
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
public static ExcelReader excel3 = null;
public static ExcelReader excel2 = null;
excel3 = new ExcelReader("C:\\Users\\DataExcelCompare\\Book3.xlsx");
excel2 = new ExcelReader("C:\\Users\\DataExcelCompare\\Book2.xlsx");
File f3 = new File("C:\\Users\\DataExcelCompare\\Book3.xlsx");
FileInputStream fi3 = new FileInputStream(f3);
Workbook workbook3 = WorkbookFactory.create(fi3);
Sheet sheet3 = workbook3.getSheet("book3");
File f2 = new File("C:\\Users\\DataExcelCompare\\Book2.xlsx");
FileInputStream fi2 = new FileInputStream(f2);
Workbook workbook2 = WorkbookFactory.create(fi2);
Sheet sheet2 = workbook2.getSheet("book2");
int firstRow2 = sheet2.getFirstRowNum();
int lastRow2 = sheet2.getLastRowNum();
int firstRow3 = sheet3.getFirstRowNum();
int lastRow3 = sheet3.getLastRowNum();
for(int i=firstRow2; i <= lastRow2; i++) {
for(int j=firstRow3; j <= lastRow3; j++) {
String ele2 = sheet2.getRow(i).getCell(0).toString().trim();
String ele3 = sheet3.getRow(j).getCell(0).toString().trim() ;
if(ele2.toString() != ele3.toString())
{
// some operation
}
else
{
// some operation
}
}
program-2::
ArrayList<String> listOne = new ArrayList<String>();
ArrayList<String> listTwo =new ArrayList<String>();
for(int i=firstRow2; i <= lastRow2; i++) {
listOne.add(sheet2.getRow(i).getCell(0).toString().trim());
System.out.println("added: " + sheet2.getRow(i).getCell(0).toString().trim());
}
for(int j=firstRow3; j <= lastRow3; j++) {
listTwo.add(sheet3.getRow(j).getCell(0).toString().trim());
System.out.println("added: " + sheet3.getRow(j).getCell(0).toString().trim());
}
listTwo.removeAll(listOne);
System.out.println("list two : " + listTwo);