0

I have to prepare a .txt file and count how many times each character of alphabet occurs in the file. I've found a very nice piece of code, but unfortunately, it doesn't work with Polish characters like ą,ę,ć,ó,ż,ź. Even though I put them in the array, for some reason they are not found in the .txt file so the output is 0.

Does anyone know why? Maybe I should count them differently, with "Switch" or something similar. Before anyone asks - yes, the .txt file is saved with UTF-8 :)

public static void main(String[] args) throws FileNotFoundException {
        int ch;
        BufferedReader reader;
        try {
            int counter = 0;

            for (char a : "AĄĆĘÓBCDEFGHIJKLMNOPQRSTUVWXYZ".toCharArray()) {
                reader = new BufferedReader(new FileReader("C:\\Users\\User\\Desktop\\pan.txt"));
                char toSearch = a;
                counter = 0;

                try {
                    while ((ch = reader.read()) != -1) {
                        if (a == Character.toUpperCase((char) ch)) {
                            counter++;
                            }
                    }

                } catch (IOException e) {
                    System.out.println("Error");
                    e.printStackTrace();
                }
                System.out.println(toSearch + " occurs " + counter);

            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }
  • 1
    If the test file is UTF8-encoded, why don't you read it with the UTF8 encoding instead of using the default character encoding of your platform? Have you done basic debugging, like printing (or inspecting with the debugger) every character you read, printing (or inspecting with the debugger) its uppercase value? – JB Nizet May 28 '17 at 14:42
  • See [Count number of each char in a String](https://codereview.stackexchange.com/q/44186/88267) or maybe [Count occurrences of each unique character](https://stackoverflow.com/q/4112111/5221149) for a way that doesn't scan the entire file multiple times. – Andreas May 28 '17 at 14:46
  • @JBNizet Short version of the answer - out teacher told us to do it like this -.- I guess she didn't expect it not to work. Aaaaand nope, but using "InputStreamReader" helps. – Wojciech Miśta May 28 '17 at 15:11
  • @Andreas Thanks, will take a look! – Wojciech Miśta May 28 '17 at 15:11

2 Answers2

3

Looks like your problem related to encoding and default system charset

try to change reader variable to this

InputStreamReader reader = new InputStreamReader(new FileInputStream("C:\\Users\\User\\Desktop\\pan.txt"), "UTF-8");
Neonailol
  • 117
  • 10
0

try this: I suggest that you use NIO and this code I have written for you using NIO, RandomAccessFile and MappedByteBuffer that is faster:

import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.util.HashMap;
import java.util.Map;

public class FileReadNio
{
public static void main(String[] args) throws IOException
{
    Map<Character, Integer> charCountMap = new HashMap<>();

    RandomAccessFile rndFile = new RandomAccessFile
            ("c:\\test123.txt", "r");
    FileChannel inChannel = rndFile.getChannel();
    MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
    buffer.load();
    for (int i = 0; i < buffer.limit(); i++)
    {

        char c = (char) buffer.get();

        if (charCountMap.get(c) != null) {
        int cnt = charCountMap.get(c);
            charCountMap.put(c, ++cnt);

        }
        else
        {
            charCountMap.put(c, 1);
        }
    }

    for (Map.Entry<Character,Integer> characterIntegerEntry : charCountMap.entrySet()) {

        System.out.printf("char: %s :: count=%d", characterIntegerEntry.getKey(), characterIntegerEntry.getValue());
        System.out.println();
    }

    buffer.clear();
    inChannel.close();
    rndFile.close();
}
}
Touraj Ebrahimi
  • 566
  • 4
  • 14