0

I am exporting an excel file (Excel 2016) containing Japanese characters into CSV. (Note : I am not exporting to CSV UTF-8 provided). In the process, all Japanese characters are replaced with '?'

My Windows/Office locale is Japan/Japanese & Windows/office language/format is all Japanese.

I understand that excel uses a codepage to save the CSV file in particular encoding. My understanding was this should be Shift-JIS (as default encoding for Japanese locale). If that is so, why the loss of information & replacement by '?'

What encoding does Excel try to save the CSV in???

(FYI : If I try to open an CSV, excel by default attempts to open the CSV in Shift-JIS 932 as expected)

Note : I am aware of workarounds of using UTF-8. I am interested in understanding above behavior, more than a workaround

Thanks

Firebrandt
  • 21
  • 5
  • 1
    Examples of characters : 縺セ縺ィ縺蜊・ケエ蜈ォ驛ス蟶 – Firebrandt Jan 30 '19 at 11:41
  • According to this question: https://stackoverflow.com/questions/508558/what-charset-does-microsoft-excel-use-when-saving-files Excel let you choose the encoding used when creating a CSV file. Maybe you can check what is the default value selected? That might help to understand why Excel is not able to convert the Japanese characters from the encoding used internally by Excel to the one used for the CSV file. – Antoine Mottier Jan 30 '19 at 12:35

1 Answers1

0

The character appears very often when you read a byte stream containing Shift-JIS (MS932) encoded hiragana characters and try to decode it as UTF-8 characters. FYI, CybetChef is handy for this kind of work. You will get the string まとづ…… as output from your string.

So in this situation, Excel 2016 seems to have written the CSV in Shift-JIS (MS932), and your text editor (or Excel 2016. How did you open the CSV?) seems to have read the CSV in UTF-8.

SATO Yusuke
  • 1,600
  • 15
  • 39