0

I am now trying to output a data frame written in Serbian. First I try to use the utf-8 or utf-8-sig codec. There are multiple columns, which is correct. However, the output csv file has the garbled problem. Then I try to output the data frame using utf-16. The codec is correct this time. But the output csv just has one column, like:

My code is:

df1.to_csv('people.csv', encoding = "utf-16",index = False)

The head of the original data frame is:

enter image description here

If I use utf-8, the output is (correct column but incorrect codec): enter image description here

If I use utf-16, the output is (correct codec but incorrect column):enter image description here

How can I solve the problem. Thanks!

Yipin
  • 173
  • 1
  • 2
  • 12
  • For some reason I counted not one but *ten* columns? Some of the columns are empty, but is this a problem? – Andrey Tyukin Apr 06 '18 at 02:43
  • @AndreyTyukin Thanks! Yes. There are about 10 columns. If I use the utf-8, then I can get 10 columns. But the codec is incorrect. – Yipin Apr 06 '18 at 02:45
  • I think it would be helpful if you could provide an [MCVE](https://stackoverflow.com/help/mcve). I don't quite understand what the problem is. The text itself seems alright, the number of columns is 10 as expected, so where is the error. Can you provide the expected result so that one has something to compare with the presumably wrong output? – Andrey Tyukin Apr 06 '18 at 02:57
  • 3
    What program are you using to inspect the CSV file? It seems like Excel doesn't (or didn't) import UTF-8 CSV files very well. https://stackoverflow.com/questions/6002256/is-it-possible-to-force-excel-recognize-utf-8-csv-files-automatically – Robᵩ Apr 06 '18 at 02:58
  • 1
    @AndreyTyukin Thanks. I add some pictures to clarify this problem. – Yipin Apr 06 '18 at 03:10
  • @Robᵩ I am trying to convert some data collected from API to csv. I use excel to inspect the CSV file because my professor uses the excel. How can I solve the problem? Thank you! – Yipin Apr 06 '18 at 03:12
  • 1
    I'm fairly certain that pandas is correctly creating the CSV file. To confirm that, open the csv file with Notepad++ or gvim or another text editor that plays well with Unicode. I expect the problem lies in how you are using excel. Use ``encoding="utf-8-sig"` and open the file with a recent version of Excel. If that doesn't work, google "excel utf-8 csv" or follow the advice in the question I linked to. – Robᵩ Apr 06 '18 at 03:15
  • @Robᵩ Thanks! It works! – Yipin Apr 06 '18 at 03:18
  • Why do the names and years combinations look so realistic? Have you made them up so carefully, or have you just dumped real data of your customers on some publicly available website on the internet? Is everyone in the world supposed to know that those peoples (with their full names and birth years) have somehow made it into your database on the given date? In any case, I vote to close, because the problem was not in the code, but in the spreadsheet program used to inspect the output. – Andrey Tyukin Apr 06 '18 at 11:19
  • @AndreyTyukin Thanks. I acquire the data from a public API. It is json format. Some values are missing because there are some people do not have such data. I have solved the problem by using utf-8-sig. Thank you for your help. – Yipin Apr 07 '18 at 01:13
  • @Yipin Consider writing up an answer, and accepting it. You can answer your own questions, and it is actually encouraged. On the other hand, having unanswered questions is not so good. – Andrey Tyukin Apr 07 '18 at 01:18
  • @AndreyTyukin Thank you for your suggestion. I add an answer to my question. – Yipin Apr 07 '18 at 04:49

1 Answers1

5

Thanks for Robᵩ and Andrey Tyukin's help! I revise the codec to utf-8-sig and I get the correct columns.

df1.to_csv('people.csv', encoding = "utf-8-sig",index = False)

The output now is:

enter image description here

Yipin
  • 173
  • 1
  • 2
  • 12