0

I'm building a word list that contains Cyrillic letters and I'm converting them into their pronounce form. E.g. č = ч = ch, š = ш = sh, ḱ = кј ǵ = ѓ = gj. So far I tried to use the Unicode form, and cyrillic letters in their original form in the change method, but each time the conversion fails. The last change I'm willing to try is to change the encoding. What am I doing wrong?!

  public class Main {
    private  static List<String> passwordList = new ArrayList<>();

    public static void main(String[] args) throws IOException {

       



        String read = null;
        String toBeModified = null;


        try {

            BufferedReader br = new BufferedReader(new FileReader("input.txt"));

            while ((read = br.readLine()) != null) {
                 toBeModified = read;
                passwordList.add(change(toBeModified));
            }
        } catch(Exception e) {
            e.printStackTrace();
        }

      
        save(passwordList);



    }
    public static void save(List<String> passwordList) throws IOException {
        BufferedWriter br = new BufferedWriter(new FileWriter("saved.txt"));
        for (int i = 0; i < passwordList.size(); i++) {
            if (passwordList.get(i).isEmpty() == false) {
                br.write(passwordList.get(i));
                br.newLine();
            }
        }



    }

    //problem here
    public static String change(String str){
       
                String newStr = "";
                if (str.contains("\u010C")){
                     newStr = str.replace("\u010D","ch");
                }else if (str.contains("\u017E")){//U+017E
                     newStr = str.replace("\u017E","zh");
                }else if (str.contains(",")){
                     newStr = str.replace(",","");
                    newStr.trim();
                } else if (str.contains(";")){
                     newStr =  str.replace(";","");
                    newStr.trim();
                } else if (str.contains(".")){
                     newStr = str.replace(".","");
                    newStr.trim();

                } else if (str.contains(":")) {
                     newStr =  str.replace(":","");
                    newStr.trim();
                } else if (str.contains("(\u0160")){
                     newStr =  str.replace("(\u0160", "sh");
                } else if (str.contains("\u1E31")){
                     newStr =  str.replace("\u1E31","kj");
                }else if (str.contains("(")){
                     newStr =  str.replace("(","");
                    newStr.trim();
                }else if (str.contains(")")){
                     newStr = str.replace(")","");
                    newStr.trim();
                    //test to check if works - works like this
//                }else if (str.contains("a")){
//                    newStr = str.replace("a","TestToSeeIfWorks");
                }
                return newStr;

    }

}
  • Minor observation: `if (str.contains("\u010C")` and `str.replace("\u010D","ch")` don't match up. Also: `newStr.trim();` doesn't do anything useful, you perhaps mean `newStr = newStr.trim();`. – Andy Turner Aug 05 '20 at 08:30
  • Do you really want to be using `else if`? Don't you want to replace all occurrences, not simply the one for the first matching condition? – Andy Turner Aug 05 '20 at 08:31
  • Thanks for the comments Andy. I made some typos there(minor obsr). When it comes to the change() I'm aware of what you are saying, but I'm using dummy.txt as input where the specific char that I'm trying to change only appears once in a given word and there isn't a word where two or more specific chars appear at the same time. I'll change the method for the final version. UPDATE I found some useful links on how to fix this issue https://www.fileformat.info/info/charset/ISO-8859-1/list.htm?fbclid=IwAR2zpwI8bcocdwLSIltlpLA8Zwpwd0lDc1u4Txqw4_NUR2V3u0Zscec_CFw – TheGoodCook Aug 06 '20 at 07:33
  • I made this test String a = "šašavo"; String test = a.replace("š", "test"); System.out.println(test); and it works... So the answer would be - I'm using mismatching unicode in my change(). :) – TheGoodCook Aug 06 '20 at 07:46
  • https://en.wikipedia.org/wiki/Romanization_of_Macedonian?fbclid=IwAR097O1OD4jYixz0WfUPzMs0vEyTdsCcWbw5W3scTOKznUg19twx_6Ah9ao – TheGoodCook Aug 06 '20 at 07:49
  • SOLVED!!!! I imported the project in Eclipse (I created it using IntelliJ IDEA). After running I got a pop up window asking me if I want to save the file using UTF-8, which I did. So the encoding was the problem all along. Also found this info on how to prevent future errors https://stackoverflow.com/questions/52685792/save-as-utf-8-error-in-eclipse-screen-why-the-program-is-not-running – TheGoodCook Aug 07 '20 at 07:24

0 Answers0