0

I want to convert some greek text from UTF-8 to String, because they cannot be recognized by Java. Then, I want to populate them into a JTable. So I use List to help me out. Below I have the code snippet:

String[][] rowData;
List<String[]> myEntries;
//...
try {
        this.fileReader = new FileReader("D:\\Book1.csv");
        this.reader = new CSVReader(fileReader, ';');
        myEntries = reader.readAll();

        //here I want to convert every value from UTF-8 to String
        convertFromUTF8(myEntries); //???

        this.rowData = myEntries.toArray(new String[0][]);
    } catch (FileNotFoundException ex) {
        Logger.getLogger(VJTable.class.getName()).log(Level.SEVERE, null, ex);
    } catch (IOException ex) {
        Logger.getLogger(VJTable.class.getName()).log(Level.SEVERE, null, ex);
    }
//...

I created a method

public String convertFromUTF8(List<String[]> s) {
    String out = null;
    try {
        for(String stringValues : s){
            out = new String(s.getBytes("ISO-8859-1"), "UTF-8");
        }
    } catch (java.io.UnsupportedEncodingException e) {
        return null;
    }
    return out;
}

but I cannot continue, because there is no getBytes() method for List. What should I do. Any idea would be very helpful. Thank you in advance.

Vassilis De
  • 363
  • 1
  • 3
  • 21

2 Answers2

3

The problem is your use of FileReader which only supports the "default" character set:

this.fileReader = new FileReader("D:\\Book1.csv");

The javadoc for FileReader is very clear on this:

The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

The appropriate way to get a Reader with a character set specified is as follows:

this.fileStream = new FileInputStream("D:\\Book1.csv");
this.fileReader = new InputStreamReader(fileStream, "utf-8");
Brett Okken
  • 6,210
  • 1
  • 19
  • 25
  • 1
    Where are you displaying? Are the greek character glyphs supported in that font/application? Are you certain that the source data is encoding as utf-8? – Brett Okken Jul 27 '14 at 20:10
  • Yes, I am sure. I used `fileReader.getEncoding()` method to return the encoding. UTF8, it sais. I am displaying at a JTable, but I don't know whether the font supports greek or not. How am I going to see that? – Vassilis De Jul 27 '14 at 20:20
  • getEncoding simply returns what is being used. It is not derived from the content itself. Do you have the byte values of the characters causing problems and what characters you expect? http://docs.oracle.com/javase/7/docs/api/java/io/InputStreamReader.html#getEncoding() – Brett Okken Jul 27 '14 at 20:27
  • Could you please tell me how exactly am I supposed to get it from List myEntries? – Vassilis De Jul 27 '14 at 20:37
  • @VassilisDe You are reading it wrong. This answer is correct. You need to use a correct input stream reader. See [this answer](http://stackoverflow.com/a/9853261). – tchrist Jul 27 '14 at 20:42
  • I think I found the problem. Since my program recognises greek characters inside JOptionPanes etc, I figured out that it doesn't recognise csv's encoding. I have to save it as txt file with utf8 encoding and then open it to the program – Vassilis De Jul 28 '14 at 11:31
1

To decode UTF-8 bytes to Java String, you can do something like this (Taken from this)

Charset UTF8_CHARSET = Charset.forName("UTF-8");

String decodeUTF8(byte[] bytes) {
    return new String(bytes, UTF8_CHARSET);
}

Once you've read the data into a String, you don't have control over encoding anymore. Java stores Strings as UTF-16 internally. If the CSV file you're reading from is written using UTF-8 encoding, you should read it as UTF-8 into the byte array. And then you again decode the byte array into a Java String using above method. Now once you have the complete String, you can probably think about splitting it to the list of Strings based on the delimiter or other parameters (I don't have clue about the data you've).

Community
  • 1
  • 1
Swapnil
  • 8,201
  • 4
  • 38
  • 57
  • While this would work, using an InputStreamReader is usually a better choice, especially for larger content (such as an entire csv file). – Brett Okken Jul 27 '14 at 20:07
  • Yeah, I've not talked about how to read into byte array, your point does fill the gaps. – Swapnil Jul 27 '14 at 20:12
  • Also mentioned in a comment the answer was taken from: Since 1.7 you can use the constant `StandardCharsets.UTF_8`, it does the same as the first line. – Frank Jun 23 '17 at 11:37