10

I have an Excel file in the Bengali language. To display the Bengali text properly I need Bengali fonts installed on the PC.

I converted the Excel file into CSV using Office 2010. But it only shows '?' marks instead of the Bengali characters. Then I used the Google Docs for the conversion, with the same problem, but with unreadable characters rather than '?'s. I pasted extracts from that file in an HTML file and tried to view it in my browser unsuccesfully.

What should I do to get a CSV file from an .xlsx file in Bengali so that I can import that into a MySQL database?

Edit: The answer accepted in this SO question made me go to Google Docs.

Community
  • 1
  • 1
Istiaque Ahmed
  • 6,072
  • 24
  • 75
  • 141

1 Answers1

4

According to the answers to the question Excel to CSV with UTF8 encoding, Google Docs should save CSV properly, contrary to Excel, which destroys all characters that are not representable in the “ANSI” encoding being used. But maybe they changed this, or something wrong, or the analysis of the situation is incorrect.

For properly encoded Bangla (Bengali) processed in MS Office programs, there should be no need for any “Bangla fonts”, since the Arial Unicode MS font (shipped with Office) contains the Bangla characters. So is the data actually in some nonstandard encoding that relies on a specially encoded font? In that case, it should first be converted to Unicode, though possibly it can be somehow managed using programs that consistently use that specific font.

In Excel, when using Save As, you can select “Unicode text (*.txt)”. It saves the data as TSV (tab-separated values) in UTF-16 encoding. You may then need to convert it to use comma as separator instead of tab, and/or from UTF-16 to UTF-8. But this only works if the original data is properly encoded.

Community
  • 1
  • 1
Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
  • how to make that conversion to use comma as separator instead of tab, and/or from UTF-16 to UTF-8? The excel file showed the Bangla fonts well. So I can assume that the original data was properly encoded . right ? – Istiaque Ahmed Jun 20 '12 at 10:32
  • I did the conversion selecting the `save as` type as ' unicode text(.txt)' , but still those '?' marks appeared as before. – Istiaque Ahmed Jun 20 '12 at 10:58
  • @Istiaque Ahmed, it really looks like the original data is not properly encoded. If it uses a nonstandard 8-bit encoding, it may look OK on programs that use a specific font but does not work when normal fonts are used, and data conversions can mess up the data. Which Bangla font are you using? – Jukka K. Korpela Jun 20 '12 at 11:19
  • I’m puzzled. Vrinda is Unicode encoded, so everything should go fine, and did go fine when I tested in my computer (using Win 7, Excel 2007). Are you sure the problem is not in the software you use to open the file? (I tested by opening in Word 2007, and when prompted for encoding, specified “Unicode”.) – Jukka K. Korpela Jun 20 '12 at 18:40
  • I wish I could send you the excel file. – Istiaque Ahmed Jun 21 '12 at 06:55
  • may we have a chat at a fixed time please? – Istiaque Ahmed Jun 21 '12 at 10:18