-1

Please ignore the bad file names but this is how I have done it so far. I want to count all ASCII characters in a file in Java but it is getting an "Array out of bounds error" with large text

This code:

class CreateZipFile {


    public static void main(String[] args) {
            try {
                CharacterCounter();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                System.out.println(e.getClass().getSimpleName() + "-" + e.getMessage());//Throws nice output message 

            }
    }

        private static void CharacterCounter() throws IOException{

        FileInputStream fstream = new FileInputStream("/Users/Devonte1/Desktop/Javatest.txt");//Read in file

        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));//Take file stream and place it into bufferedReader
        OutputStreamWriter bw = null;

        String strLine="";
        String removeSpace="";
        while ((strLine = br.readLine()) != null) {

            removeSpace+=strLine;
        }

        String st=removeSpace.replaceAll(" ", "");//Replace all spaces 
        char[]text = st.toCharArray();//Create new conjoined character array
        System.out.println("Character Total");

        int [] count = new int [256];//Character array

        //Create index 
            for(int x = 0; x < 256; x ++){
                    count[x]=0;
            }

        //Search file 
        for (int index = 0; index < text.length; index ++) {
             char ch = text[index];
             int y = ch;
             count[y]++;
        }

        //
        for(int x = 0; x < 256; x++){
            char ch= (char) x;
            if (count[x] == 0){ 
                System.out.println("Character not used"+ " "+ ch + " = (char code " + (int) ch + ")");
            }
            else if (count[x] != 0){
                System.out.println("Character " + ch + " used" + count[x] + " = (char code " + (int) ch + ")");
            }
        }

        }

}

Error:

Error:Arrayoutofboundexception: 8217

What am i doing wrong?

akshat
  • 1,219
  • 1
  • 8
  • 24
DeCampbell
  • 29
  • 6
  • Please add the full stacktrace. – dpr May 15 '18 at 12:21
  • Given `char is defined as Unicode character, which implies 16 bits unsigned` why is your range 256? – Oleg Sklyar May 15 '18 at 12:23
  • Why are you using a `DataInputStream`? You should only use `DataInputStream` to read back a file that was originally written with a `DataOutputStream`! Just use a plain-old `FileInputStream' instead. – Kevin Anderson May 15 '18 at 12:32
  • `count` is 256 elements, but `count[y]++` is indexing with the value of a `char` from your your input file. `char`s can have values up to 65,536, so you're vulnerable to an `ArrayIndexOutOfBoundsException` there. – Kevin Anderson May 15 '18 at 12:45
  • That was the right single quotation char in your text. – Joop Eggen May 15 '18 at 13:31
  • Thanks i just changed the program to cast it from fileinputstream to bufferedreader instead. – DeCampbell May 16 '18 at 17:26

1 Answers1

1

Solution 1

Count statistics for all 65,535 characters. Need to change the size of the count array to be of length 65,535:

int [] count = new int [65535];  // Character array

// Create index 
for (int x = 0; x < 65535; x ++){
  count[x] = 0;
}

Also change 256 to 65535 in the last part, when printing statistics.

Solution 2

Count statistics only for characters with ordinal value smaller than 256:

// Create index 
for(int x = 0; x < 256; x ++){
  count[x] = 0;
}

// Search file 
for (int index = 0; index < text.length; index ++) {
  char ch = text[index];
  int y = ch;
  if (y < 256)
    count[y]++;
}
Ossin Java guy
  • 365
  • 3
  • 12
  • 1
    `char` values are UTF-16 code units so I'd be a bit reluctant to call them "characters". Their values range from [Character.MIN_VALUE](https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#MAX_VALUE) to [Character.MAX_VALUE](https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#MAX_VALUE). So, yes, `int[Character.MAX_VALUE + 1]` would work. (0 to 127 would be the complete codepoint for the C0 Controls and Basic Latin block. 0 to 255 would add the C1 Controls and Latin-1 Supplement block. See [Unicode](http://www.unicode.org/charts/nameslist/index.html)) But try "". – Tom Blodget May 15 '18 at 22:14
  • Thanks. In ADABAS character sets up to 256 only will appear in the data and that is the type of file the program will loop through so 256 is sufficient in this case. – DeCampbell May 16 '18 at 17:25
  • OK. @DeCampbell please mark "answered" if the solutions works for you. – Ossin Java guy May 17 '18 at 07:34
  • Hi i am not sure how to do that? – DeCampbell May 17 '18 at 09:33
  • @DeCampbell Interesting but are those 256 characters always [U+0000 to U+00FF](http://www.unicode.org/charts/nameslist/index.html)? (If so, that would be the same as ISO 8859-1.) And, since FileInputStream always uses the character encoding that Java sees as the user's system default, does that match up with ADABAS, too? Note: Java isn't "flexible" like C where a `char` holds values in whichever character encoding you wish, even custom encoding and yet even non-text data, too. – Tom Blodget May 20 '18 at 21:34