1

I have a text file in my computer which I am reading form my java program, I want to build some criteria. Here is my Notepad File :

   #Students
   #studentId   studentkey  yearLevel   studentName token   
   358314           432731243   12          Adrian      Afg56       
   358297           432730131   12          Armstrong   YUY89       
   358341           432737489   12          Atkins      JK671   

        #Teachers
        #teacherId  teacherkey    yearLevel teacherName token   
        358314          432731243   12          Adrian      N7ACD       
        358297          432730131   12          Armstrong   EY2C        
        358341          432737489   12          Atkins      F4NGH

While reading this from a notepad with the below code i get Array out of bound exception. While debugging i get "  #Students" value for strLine.length(). Can anyone help to solve this?

private static Integer STUDENT_ID_COLUMN = 0;
private static Integer STUDENT_KEY_COLUMN = 1;
private static Integer YEAR_LEVEL_COLUMN = 2;
private static Integer STUDENT_NAME_COLUMN = 3;
private static Integer TOKEN_COLUMN = 4;

public static void main(String[] args) {
    ArrayList<String> studentTokens = new ArrayList<String>();

    try {
        // Open the file that is the first
        // command line parameter
        FileInputStream fstream = new FileInputStream("test.txt");
        BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
        String strLine;
        // Read File Line By Line
        while ((strLine = br.readLine()) != null) {
            strLine = strLine.trim();

            if ((strLine.length()!=0) && (strLine.charAt(0)!='#')) {
                String[] students = strLine.split("\\s+");
                studentTokens.add(students[TOKEN_COLUMN]);
            }


        }

        for (String s : studentTokens) {
            System.out.println(s);
        }

        // Close the input stream
        in.close();
    } catch (Exception e) {// Catch exception if any
        System.err.println("Error: " + e.getMessage());
    }
}
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
user2131465
  • 97
  • 2
  • 3
  • 11

4 Answers4

1

considered the charakter-sets, maybe the file is thought to be in Unicode, but you are asking for ASCII ? you can change that here:

BufferedReader br = new BufferedReader(new InputStreamReader(in, charakterset));

this could help: Java InputStream encoding/charset

Community
  • 1
  • 1
mspringsits
  • 43
  • 1
  • 9
1

It seems you are facing some encoding issue. Save and read the file in the same format. Preferably use UTF-8. Use the constructor new FileInputStream(<fileDir>, "UTF8") for reading.
How to save a file in unicode

Dhrubajyoti Gogoi
  • 1,265
  • 10
  • 18
1

Its possible that the encoding of your file is not the same as what you are reading in as.

Either find out the encoding of your file or convert it to UTF8 then in your code read it in with that encoding like below.

Also you should change strLine.charAt(0)!='#' to !strLine.contains("#") unless it is guaranteeed to be the first character and could possibly occur in one of the other fields

Also it a is good idea call printStackTrace() of any exceptions you catch

public static void main(String[] args) {
   ArrayList<String> studentTokens = new ArrayList<String>();

   try {
       // Open the file that is the first
       // command line parameter
       FileInputStream fstream = new FileInputStream(new File("C:\\Fieldglass\\workspace-Tools\\Tools\\src\\tools\\sanket.txt"));

  // ------ See below, added in encoding, you can change this as needed if not using utf8
       BufferedReader br = new BufferedReader(new InputStreamReader(fstream, "UTF8"));

       String strLine;
       // Read File Line By Line
       while ((strLine = br.readLine()) != null) {
           strLine = strLine.trim();

           if ((strLine.length()!=0) && (!strLine.contains("#"))) {
               String[] students = strLine.split("\\s+");
               studentTokens.add(students[TOKEN_COLUMN]);
           }
       }

       for (String s : studentTokens) {
           System.out.println(s);
       }

       // Close the input stream
       fstream.close();
       br.close();  // dont forget to close your buffered reader also
   } catch (Exception e) {// Catch exception if any
       e.printStackTrace();
       System.err.println("Error: " + e.getMessage());
   }
}

You can look here for Java supported encodings (as of 1.5)

Java Devil
  • 10,629
  • 7
  • 33
  • 48
1

The information that you have provided is inaccurate.

While reading this from a notepad with the below code i get Array out of bound exception.

I cannot see how this is possible if the code and input are as you state. The only place I can see that could throw an ArrayIndexOutOfBoundsException is this line:

  students[TOKEN_COLUMN]

But my reading of your code and input is that every input line that gets that far has 5 fields. When split, that will give you an array with 5 elements, and students[TOKEN_COLUMN] will work.

IMO, either the program or the input is NOT as you have described. (My guess is that you have input lines with less than 5 fields.)

While debugging i get "  #Students" value for strLine.length().

That is bizarre to the point of being implausible. strLine.length() returns an int. What you are showing us is a String.


Actually, I have an inkling of what is going on. If "  #Students" is the value of strLine (not strLine.length() !!) then you've somehow managed to get some garbage at the start of the file. When your code examines this, the first character won't be a '#', and the line will appear to have 2 fields instead of 5. THAT will cause the exception ...

And I think I know where that garbage comes from. It is a UTF-8 Byte Order Marker, that was inserted at the start of the file by NotePad ... because you saved the file as UTF-8. Then the file was read using CP1252 ... which is (I presume) your system's default characterset.

Lesson: Don't use Notepad. Use a real editor.

Reference: https://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216