2

I have an array of arrays like so:

[["£", 1],["3", "9"]]

I'm using this Python code:

import pandas as pd

data = [["£", 1],["3", "9"]]
fields = ["a", "b"]
pd.DataFrame(data=data, columns=fields).to_csv("output.csv")

Upon opening in some text editor you will see the output is like so:

,a,b
0,£,1
1,3,9

But if you open in Excel the £ appears as enter image description here (I have no clue why this is a picture, but, we move!). If I use the command:

pd.DataFrame(data=data, columns=fields).to_excel("output.xlsx")

it appears correctly. But, I need it as a .csv and without the accented A when opened in Excel.

What have I tried:

  • I have tried to set the encoding to no luck
  • I tried looking at the source code of both functions, again to no luck
  • I tried using .to_excel("output.csv"), as I expected this raised an Exception

I wonder if it is maybe an issue with Excel itself (since it's fine when opened with Rons Editor, Notepad etc....)

GAP2002
  • 870
  • 4
  • 20
  • 1
    You're saving a CSV, ie a text file, not an Excel file. Unless you specify an encoding with a BOM Excel has no idea what encoding is used in that file. Excel doesn't open CSV files, it *imports* them using specific settings. If you use the `Data` menu you can specify the encoding. If you double click on the file, Excel will check for a BOM and if that's not found, it will use the system locale's codepage – Panagiotis Kanavos Apr 06 '21 at 11:59
  • 1
    If you want the files to be used by Excel, use `to_excel`. That produces a smaller file and maintains the data type (at least for numbers). If you want to produce a CSV that can be imported by double-clicking use `.to_csv('file.csv',encoding='utf-8-sig')` – Panagiotis Kanavos Apr 06 '21 at 12:03
  • 1
    @Manakin that's a very bad idea - it doesn't solve the issue, it makes it worse. The file is already UTF8. There's nothing wrong with that. Assuming that the Windows machine uses Latin1 though will mangle the text in every locale that *doesn't* use Latin1. – Panagiotis Kanavos Apr 06 '21 at 12:09
  • 1
    @Manakin Windows is a Unicode OS. It doesn't need any extra codecs. The strings are already Unicode. That's the way it works since 1994, when the first version of Windows NT came out. When reading a text file though, an application has no idea what encoding it uses. With UTF16 and UTF32 files, the BOM at the start of the file tells Windows (and the app) what encoding to use. Without a BOM, the application defaults to the system locale, or as the settings page says the `codepage used by non-Unicode applications` – Panagiotis Kanavos Apr 06 '21 at 12:13
  • @PanagiotisKanavos great thanks for the advice, that's really useful! – Umar.H Apr 06 '21 at 12:56

0 Answers0