1

I have an application that writes ANSI-encoded CSV files to a file share, but the application I use to read them expects UTF-8.

Neither application has the option of changing the encoding, so I have been doing this manually. I have been trying to find a script online that I can use as a scheduled task to convert these files.

Is this even possible? If so can anyone suggest a script.

Right now I have only tried converting manually as I am not sure how to script this.

Zach Young
  • 10,137
  • 4
  • 32
  • 53
MukIT
  • 11
  • 1

1 Answers1

0

If you can use Python, the following should get you on the right track.

Presuming that ANSI means Windows-1252, I believe the following script will only change the encoding and leave the "data" as-is.

If the input encoding is something other than Windows-1252 (e.g., 'cp437', 'windows-1254'), just replace the input encoding below:

f_in = open("input.csv", encoding="windows-1252")
f_out = open("output.csv", "w", encoding="utf-8")

for line in f_in:
    f_out.write(line)

f_out.close()
f_in.close()

I made up this input.csv and saved it as Windows-1252:

Col1, Col2
ÀÁÂ,  àáâ
ÈÉÊ,  èéê
ÍÎÏ,  íîï
ÔÕÖ,  ôõö
ÚÛÜ,  úûü

ran that script, and output.csv looks correct.

Zach Young
  • 10,137
  • 4
  • 32
  • 53
  • The OP didn't specify the OS or the scripting language (for the OP, it's Python). `utf-8` is not always the default encoding for `open`. It is OS-dependent and is the result of `locale.getpreferredencoding(False)` (or `locale.getencoding()` in Python 3.11). Better to specify the encoding explicitly. – Mark Tolonen Mar 18 '23 at 03:47
  • @MarkTolonen, thank you for pointing out the platform dependent nature of open. I just checked the doc again, right there all along. I'm not quite sure what to make of your comment about OP's OS and Python, though. I inserted a hedge-y statement about, "If you can use Python, then...". Is that what you meant? – Zach Young Mar 18 '23 at 04:48
  • Sorry, I meant the OP didn't specify a particular language and didn't mention their OS which affects the default encoding for `open`. I *commented* for the OP that you were using Python so your statement now covers it in the answer. – Mark Tolonen Mar 18 '23 at 04:53