2

I have a CSV file that contains accentuated characters. I checked the encoding while opening with PyCharm and Sublime, it's Western: Windows 1252, or ISO-8859-1.

I create a pandas dataframe from this CSV, then modify it, and export it to an UTF-8 text file. I check the exported file with PyCharm and Sublime Text, I don't know why the exported file is not in UTF-8.

Here is my code:

dataset= pd.read_csv("my_file.csv", sep=";", encoding="ISO-8859-1")
print(dataset.loc[0, "my_col"])
>>> "s'il vous plaît"

# Export data
with open("out.txt"), "w", newline='') as f:
    dataset.to_csv(path_or_buf=f, sep="\t", header=False, index=False, encoding="utf-8")

When opening "out.txt" with PyCharm, it shows s'il vous pla�t, and PyCharm tells me that the encoding of the file is not UTF-8.

Be Chiller Too
  • 2,502
  • 2
  • 16
  • 42

1 Answers1

6

You're writing through a file object in text mode with default encoding, this takes output encoding precedence over and makes encoding parameter in the method to_csv useless.

You should use something like the following instead.

# Export data
with open("out.txt", "w", newline='', encoding="utf-8") as f:
    dataset.to_csv(path_or_buf=f, sep="\t", header=False, index=False)

Or without a file object:

# Export data
dataset.to_csv(path_or_buf="out.txt", sep="\t", header=False, index=False, encoding="utf-8")
Kul-Tigin
  • 16,728
  • 1
  • 35
  • 64