4

I have a strange problem where I have a log file called transactionHandler.log.It is a very big file having 17102 lines.This i obtain when i do the following in the linux machine:

wc -l transactionHandler.log
17102 transactionHandler.log

But when i run the following java code and print the number of lines i get 2040 as the o/p.

import java.io.*;
import java.util.Scanner;
import java.util.Vector;

public class Reader {

    public static void main(String[] args) throws IOException {     
        int counter = 0; 
        String line = null;

         // Location of file to read
        File file = new File("transactionHandler.log");

        try {

            Scanner scanner = new Scanner(file);

            while (scanner.hasNextLine()) {
                line = scanner.nextLine();
                System.out.println(line);
                counter++;                    
            }
            scanner.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }           
        System.out.println(counter);        
    }
}

Can you please tell me the reason.

Radu Murzea
  • 10,724
  • 10
  • 47
  • 69
Phoenix225
  • 167
  • 5
  • 12
  • Sorry i missed the counter.I have edited it.In the actual code the counter is very much present. – Phoenix225 May 23 '12 at 07:48
  • 7
    Have you compared the output of your program with the original logfile? Do you see any differences (and what, if any)? Have you tried with smaller input files? Do you observe the error with any input file, or only with specific ones? – Péter Török May 23 '12 at 07:52
  • what do you get for `System.out.println(counter);` – Fahim Parkar May 23 '12 at 07:53
  • 6
    Are you running your java program on same linux machine or copied file to other machine and ran program there? – AlexR May 23 '12 at 07:53
  • have you compared the output of prog with actual file. This can be line delimiter issue. First couple of line printed by prog – ejb_guy May 23 '12 at 07:54
  • 3
    I'd recommend you to run your program redirecting output to other file. Then run `diff` command to compare original and new file. I believe you will see the difference quickly. – AlexR May 23 '12 at 07:54
  • 3
    @Phoenix225 What comes to my mind is that `wc -l` counts the occurrences of all `EOL` delimiters. The Scanner of Java probably (need to test to confirm this) will ignore repeated `EOL` delimiters (this means that will ignore empty lines). – MrJames May 23 '12 at 07:58
  • 1
    Moreover Scanner class has its own limitations about loading large files. Please check this post http://stackoverflow.com/questions/10336478/does-the-scanner-class-load-the-entire-file-into-memory-at-once – dharam May 23 '12 at 08:04
  • I think there was a problem with the file.Earlier the file was automatically getting downloaded into the local machine from the remote machine.This time i tried manually copying the file into the project folder.Now it seems to be working fine.Now all i need to find out is the reason behind the file getting corrupted sometimes.Thanks all for your responses...Might get back to you :) After all the 'file corruption' issue is still lurking in the corner dangerously. – Phoenix225 May 23 '12 at 09:51

1 Answers1

8

From what I know, Scanner uses \n as delimiter by default. Maybe your file has \r\n. You could modify this by calling scanner.useDelimiter or (and this is much better) try using this as an alternative:

import java.io.*;

public class IOUtilities
{
    public static int getLineCount (String filename) throws FileNotFoundException, IOException
    {
        LineNumberReader lnr = new LineNumberReader (new FileReader (filename));
        while ((lnr.readLine ()) != null) {}

        return lnr.getLineNumber ();
    }
}

According to the documentation of LineNumberReader:

A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.

so it's very adaptable for files that have different line terminating characters.

Give it a try, see what it does.

Radu Murzea
  • 10,724
  • 10
  • 47
  • 69
  • It does not work like this.I tried using your code.But the o/p is still 2040. – Phoenix225 May 23 '12 at 09:43
  • @Phoenix225 Maybe you should give us a sample of that file, it will probably be easier to figure it out that way. – Radu Murzea May 23 '12 at 09:45
  • Now it is working fine with your code too.I have provided an explanation above for what the problem actually turned out to be.Kindly have a look at it. – Phoenix225 May 23 '12 at 09:54