0

Hi I am trying to write a word counter class on Java. I suppose I am reading a file with scanner from the base folder and printing them to the console. However, first item of file returns with a prefix ÿş or sometimes ?? two question mark. Every item in files are string words. Here is my source code, I could not managed to handle this, so please any help would be appreciated, thanks... (By the way I am using JCreator LE 4.5)

import java.io.*;
import java.util.*;

public class WordCounter implements Comparator<Integer>{

    public static Scanner myScanner;
    private static int orderNumber = 0;
    private static String inputName = "";
    private static String outputName = "";
    private static LinkedHashMap<String, Integer> dictionary = new LinkedHashMap<String, Integer>();
    private static ArrayList<String> words = new ArrayList<String>();
    private static SortedSet<String> keys;
    private static Scanner in;

    public static void main(String[] args) {
        myScanner = new Scanner(System.in);
        System.out.println("Please enter a file name to read...");
        inputName = myScanner.nextLine();
        System.out.println("Please enter a file name to write in...");
        outputName = myScanner.nextLine();
        askForOptions();

        readFromFile(inputName);
        writeToFile(outputName, orderNumber);
    }

    private static void readFromFile(String fileName){
        try {
            in = new Scanner(new File(fileName+".txt"));
            while(in.hasNext()){
                String lowered = in.next().toLowerCase();
                if(!lowered.equals(" ") || !lowered.equals("") || !lowered.equals(null)){
                    System.out.println(lowered);
                    int lastInd = lowered.length()-1;
                    char lastChar = lowered.charAt((lastInd-1));
                    System.out.println(lastChar);
                    if (lastChar == '?' || lastChar == ',' || lastChar == '.'){
                        String newLowered = lowered.substring(0, (lastInd-1));
                        words.add(newLowered);
                    }else{
                        words.add(lowered);
                    }
                }
            }
            in.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }

    private static void writeToFile(String fileName, int orderNumber){
        for(String word: words) {
            if(dictionary.containsKey(word)){
                int val = (int) dictionary.get(word);
                dictionary.put(word, val+1);
            }else{
                dictionary.put(word, 1);
            }
        }

        if(orderNumber == 1){
             keys = new TreeSet<String>(dictionary.keySet());
             try {
                    FileWriter writer = new FileWriter(fileName+".txt");
                    for(String key:keys){
                        writer.write(key + "\t" + dictionary.get(key) + "\n");
                    }
                    writer.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
        }else if(orderNumber == 2){
            Comparator<Integer> comp = new Comparator<Integer>() ;
            TreeMap<Integer, String> wordsMap = new TreeMap<Integer, String>(comp);
            for(Map.Entry<String, Integer> entry:dictionary.entrySet()){
                wordsMap.put(entry.getValue(),entry.getKey());
            }
            try {
                FileWriter writer = new FileWriter(fileName+".txt");
                for(Map.Entry<Integer, String> entry: wordsMap.entrySet()){
                    writer.write(entry.getValue() + "\t" + entry.getKey() + "\n");
                }
                writer.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

    private static void askForOptions(){
        System.out.println("How do you want the result? \nPress 1 to get result in alphabethic order, \nPress 2 to in frequency order.");
        int option = myScanner.nextInt();
        if (option == 1){
            orderNumber = 1;
            System.out.println("Thank you! Good luck...");
        }else if(option == 2){
            orderNumber = 2;
            System.out.println("Thank you! Good luck...");
        }else{
            System.out.println("Invalid choice! Plese try again...");
            askForOptions();
        }
    }

    @Override
    public int compare(Integer arg0, Integer arg1) {
        if (arg0 == arg1) return 0;
        if (arg0 > arg1) return 1;
        if (arg0 < arg1) return -1;
        return 0;
    }

}
Talha ŞEKER
  • 157
  • 1
  • 2
  • 13

1 Answers1

0

ÿş are UTF-16 BOM bytes FF FE printed using Windows 1254 codepage, which is your system default I believe.

To read file correcly, you need to skip BOM, which can be done using Apache Commons IO BOMInputStream wrapper:

try (BOMInputStream bis = new BOMInputStream(new FileInputStream(filename));
     Scanner in = new Scanner(bis, bis.getBOMCharsetName() == null 
                                   ? Charset.defaultCharset().name() 
                                   : bis.getBOMCharsetName())) {
     // read lines

} catch (IOException e) {
     // ...
}

Or you can skip those 2 bytes manually as described in answers for this post.

Community
  • 1
  • 1
Alex Salauyou
  • 14,185
  • 5
  • 45
  • 67