-1

I need to read lots of text files to develop my project. Each file contains tweets and retweets of a person. I wrote simple java code to do that. I also tried to read the files using c code. it is showing same problems as well The program can read some lines properly, but in some cases in it breaking the lines and reading 1 single line into two different lines. In some places the program is inputting new lines as well.

I need to read the files as it is they are. Could you kindly let me know, is it due to the inputs of files or due to some other reason. Is there any solution? thanks

Below is my code which is very simple.

public class Check {

public static void main(String[] args) throws FileNotFoundException, IOException {

   File InfileName = new File ("c:/users/syeda/desktop/12.txt");

   Scanner in = new Scanner(new FileReader(InfileName));

   String line="";
   int lineNo=0;

   while(in.hasNext()== true)
           {
                line = in.nextLine();
                System.out.println(line); 
                lineNo++;

            } 
    System.out.println(lineNo);

  }
}

My input file contains only 800 lines but it is showing 819 lines as output. The extra 19 lines are some blank lines which are not in the input files and some lines from input file are broken into two lines and showing the extra 19 lines

Am_I_Helpful
  • 18,735
  • 7
  • 49
  • 73

1 Answers1

1

Your data is not what you think it is:

Your file has multiple line separators in a row. That is where the blank lines are coming from.

\n\n will count as an empty line, Windows is probably \n\r\n\r.

End of line markers are invisible in things like TextPad you have \n or \n\r where you do not think they are, it is that simple.

Garbage In, Garbage Out

Code is correct, data is wrong.

Also Scanner is the wrong choice, BufferedReader would be a better solution.

Community
  • 1
  • 1
  • thanks for your reply. I tried with BufferedReader as well. It is showing the same result While reading, program is breaking some lines into two lines. Part of my code with with bufferedreader is : while((line=reader.readLine()) != null) { System.out.println(line); lineNo++; } System.out.println(lineNo); } – syeda firdaus May 26 '15 at 10:07
  • could you kindly let me know how to solve the problem? How to deal with the unwanted new line (\n\r) ? Is there any way to remove the unwanted new lines? – syeda firdaus May 26 '15 at 14:03