0

I can already read in texts from xlsx cells and have:

String s = cell.getStringCellValue();

However when printing out this String, I get rubbish results. To solve this problem I used the Internet.

I tried about 8 different approaches and thus found that there is not yet a working answer on SO. I set the default encoding of my IDE and my XLSX Files to UTF-8. Pinyin can be correctly displayed.

Does anyone have an idea what could be wrong and how to solve this issue?

blauerschluessel
  • 219
  • 1
  • 3
  • 14
  • 2
    How are you printing it? Where are you printing it? What's the result? – Alastair McCormack Feb 04 '17 at 18:14
  • I printed it at the console with System.out.println(s); - I tried it in my IDE and then in the cmd window. I can see a square with a question mark in it. However I can affirm: Those symbols DO contain the right information. I copy paste it into an editor and DO get the Chinese Characters from my excel file. – blauerschluessel Feb 04 '17 at 20:40
  • 1
    Windows cmd window? What language/locale/region have you configured for your Windows session? – Alastair McCormack Feb 04 '17 at 21:19
  • @Alastair McCormack: Although you may be right and Chinese locale settings will make CMD able to show Chinese characters, is this really the solution? I mean it is Unicode and we are in 21'th century. An operating system should be able handling Unicode properly. Its a shame, Windows 10 is not able to do so in it's console until now. – Axel Richter Feb 05 '17 at 09:05
  • @AxelRichter indeed it's a very poor state of affairs but Java has some blame too (Windows has very good Unicode support elsewhere like its filename handling and clipboard support). It is possible to show Unicode it the cmd console but requires the app to use low level access rather than the DOS compatible mode. See https://github.com/Drekin/win-unicode-console, which gives Python the ability to do just that. I don't know if something similar would be possible with JNI. – Alastair McCormack Feb 05 '17 at 09:18
  • @AxelRichter apparently it is possible: http://stackoverflow.com/a/8921509/1554386 – Alastair McCormack Feb 05 '17 at 09:25
  • @Alastair McCormack: Yes possible using a library which is not shipped with Java by default. Then I would rather prefer using my write-to-file approach or using Swing (JTextArea) to provide my own output area for test outputs. – Axel Richter Feb 05 '17 at 10:33
  • @AxelRichter indeed, I don't disagree. I wanted to ascertain the actual problem. If it is a Windows cmd problem then it ought to be closed as duplicate. – Alastair McCormack Feb 05 '17 at 10:41

3 Answers3

2

Not clear wherever your problem using chinese characters comes from, but I cannot reproduce it.

I have the following workbook in Excel:

enter image description here

The following simple code:

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;

import java.io.FileInputStream;

class ReadXSSFUnicodeTest {

 public static void main(String[] args) {
  try {

   Workbook wb = WorkbookFactory.create(new FileInputStream("ReadXSSFUnicodeTest.xlsx"));

   Sheet sheet = wb.getSheetAt(0);

   for (Row row : sheet) {
    for (Cell cell : row) {
     String string = cell.getStringCellValue();
     System.out.println(string);
    }
   }

   wb.close();

  } catch (Exception ex) {
   ex.printStackTrace();
  }
 }
}

produces:

enter image description here

If the problem is that Windows is not able displaying Unicode characters properly in CMD console because it has not a font with glyphs for it, then write the content to a text file:

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;

import java.io.FileInputStream;
import java.io.Writer;
import java.io.BufferedWriter;
import java.io.OutputStreamWriter;
import java.io.FileOutputStream;

class ReadXSSFUnicodeTest {

 public static void main(String[] args) {
  try {

   Writer out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("ReadXSSFUnicodeTest.txt"), "UTF-8"));

   Workbook wb = WorkbookFactory.create(new FileInputStream("ReadXSSFUnicodeTest.xlsx"));

   Sheet sheet = wb.getSheetAt(0);

   for (Row row : sheet) {
    for (Cell cell : row) {
     String string = cell.getStringCellValue();
     out.write(string + "\r\n");
     System.out.println(string);
    }
   }
   out.close();   

   wb.close();

  } catch (Exception ex) {
   ex.printStackTrace();
  }
 }
}

This file then should have proper content even in Windows Notepad:

enter image description here

You could also using Swing (JTextArea) to provide your own output area for test outputs:

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;

import java.io.FileInputStream;
import java.io.Writer;
import java.io.BufferedWriter;
import java.io.OutputStreamWriter;
import java.io.FileOutputStream;

import javax.swing.*;
import java.awt.*;


class ReadXSSFUnicodeTest {

 public ReadXSSFUnicodeTest() {
  try {

   MySystemOut mySystemOut = new MySystemOut();

   Workbook wb = WorkbookFactory.create(new FileInputStream("ReadXSSFUnicodeTest.xlsx"));

   Sheet sheet = wb.getSheetAt(0);

   for (Row row : sheet) {
    for (Cell cell : row) {
     String string = cell.getStringCellValue();
     //System.out.println(string);
     mySystemOut.println(string);
    }
   }

   wb.close();

  } catch (Exception ex) {
   ex.printStackTrace();
  }
 }

 public static void main(String[] args) {
  ReadXSSFUnicodeTest readXSSFUnicodeTest= new ReadXSSFUnicodeTest();
 }

 private class MySystemOut extends JTextArea {

  private String output = "";

  private MySystemOut() {
   super();  
   this.setLineWrap(true);
   JFrame frame = new JFrame("My System Outputs");
   frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
   JScrollPane areaScrollPane = new JScrollPane(this);
   areaScrollPane.setVerticalScrollBarPolicy(JScrollPane.VERTICAL_SCROLLBAR_ALWAYS);
   areaScrollPane.setPreferredSize(new Dimension(350, 150));
   frame.getContentPane().add(areaScrollPane, BorderLayout.CENTER);
   frame.pack();
   frame.setVisible(true);  
  }

  private void println(String output) {
   this.output += output + "\r\n";
   this.setText(this.output);
   this.revalidate();
  }
 }
}

This is only the simplest way and only to get test outputs since it uses Swing not the right way in terms of AWT threading issues.

Axel Richter
  • 56,077
  • 6
  • 60
  • 87
0

I had the same problem while extracting Persian text from an Excel file. I was using ECLIPSE and change settings like:

  1. Window -> Preferences -> Expand General and
  2. Click Workspace, text file encoding (near bottom) has an encoding chooser.
  3. Select "Other" radio button -> Select UTF-8 from the drop down. Click Apply and OK button OR click simply OK button
Suketa
  • 67
  • 11
0

use this Code:

String new_Str = new String(excelfield.getBytes(1), "Cp1256"); //....to Persian text

String new_Str = new String(excelfield.getBytes(1), "UTF-8"); //....to Chinese text

OR

String new_Str = new String(your_str.getBytes(), "Cp1256");

String new_Str = new String(your_str.getBytes(), "UTF-8");