Java - Comparing two huge text files

Question

I am trying to develop a basic java program to compare two huge text files and print non matching records .i.e. similar to minus function in SQL. but I am not getting the expected results because all the records are getting printed even though both files are same. Also suggest me whether this approach is performance efficient for comparing two huge text files.

import java.io.*;

public class CompareTwoFiles {
    static int count1 = 0 ;
    static int count2 = 0 ;

    static String arrayLines1[] = new String[countLines("\\Files_Comparison\\File1.txt")];
    static String arrayLines2[] = new String[countLines("\\Files_Comparison\\File2.txt")];

    public static void main(String args[]){  
        findDifference("\\Files_Comparison\\File1.txt","\\Files_Comparison\\File2.txt");
        displayRecords();
    }

    public static int countLines(String File){

        int lineCount = 0;
        try {
           BufferedReader br = new BufferedReader(new FileReader(File));
           while ((br.readLine()) != null) {
               lineCount++;
           }

       } catch (FileNotFoundException e) {
           e.printStackTrace();
       } catch (IOException e) {
           e.printStackTrace();
       }
           return lineCount;
    }

    public static void findDifference(String File1, String File2){
        String contents1 = null;  
        String contents2 = null; 
        try  
        {  
            FileReader file1 = new FileReader(File1);  
            FileReader file2 = new FileReader(File2);
            BufferedReader buf1 = new BufferedReader(file1); 
            BufferedReader buf2 = new BufferedReader(file2);

           while ((contents1 = buf1.readLine()) != null)  
            {  
               arrayLines1[count1] = contents1 ;
               count1++;
            }  

           while ((contents2 = buf2.readLine()) != null)  
            {  
               arrayLines2[count2] = contents2 ;
               count2++;
            }
       }catch (Exception e){
           e.printStackTrace();
       }
}



    public static void displayRecords() {      
        for (int i = 0 ; i < arrayLines1.length ; i++) {    
            String a = arrayLines1[i];  
            for (int j = 0; j < arrayLines2.length; j++){  
                String b = arrayLines2[j];  
                boolean result = a.contains(b);  
                   if(result == false){  
                       System.out.println(a);  
                   }  
            }

        }
    }
}

Either put in some debug statements or use a debugger, and the code seems to be correct. Probably the data? — Scary Wombat, Nov 10 '16 at 02:51
I have even tried with only 2 records files, it is not working correctly. I think I am missing something very small. — jay, Nov 10 '16 at 02:56
Hm what if String a is `lala` and string b is `LAla` you should have that in mind.If you expect them to be the same use `toLowerCase();` on both or something similar. — GOXR3PLUS, Nov 10 '16 at 03:22
I am able to figure out issue but I am not able to fix it. The issue is in `boolean result = a.contains(b); if(result == false){ System.out.println(a); }` — jay, Nov 10 '16 at 03:42
I am able to figure out the issue but I am not able to fix it. The issue is in `boolean result = a.contains(b); if(result == false){ System.out.println(a); ` since arrays are not sorted it is printing every record in a file once. Please assist — jay, Nov 10 '16 at 03:48
"whether this approach is performance efficient": no, it's not. You read both files twice (time waste) and if the files are really huge you will also have problems to store both in memory in full. — Henry, Nov 10 '16 at 06:20

score 0 · Accepted Answer · answered Nov 10 '16 at 05:32

Based upon your explanation you do not need embedded loops

consider

public static void displayRecords() { 

    for (int i = 0 ; i < arrayLines1.length && i < arrayLines2.length; i++) 
    {    
        String a = arrayLines1[i];  
        String b = arrayLines2[i];  

        if(!a.contains(b){  
               System.out.println(a);  
        }  
    }

score -1 · Answer 2 · answered Nov 10 '16 at 04:41

-1

For the performance wise, you should try to match the size of the files. If the sizes(in bytes) are exactly the same, you might not need to compare them.

answered Nov 10 '16 at 04:41

zawhtut

8,335
5
52
76

Two files of same size may not have same contents. – bane19 Nov 10 '16 at 04:46
I'm actually expecting this kind of reponse :) Yes, you can compare using hash http://stackoverflow.com/questions/15441315/java-and-hash-algorithm-to-compare-files – zawhtut Nov 10 '16 at 09:23
You could have added the answer link in your answer. Anyways, cool. Thanks :) – bane19 Nov 10 '16 at 09:44
Thanks a lot for your response. As per your suggestion for hash approach, please let me know whether we can print the non matching records between two files – jay Nov 10 '16 at 12:34

Java - Comparing two huge text files

2 Answers2