1

I'm trying to do load a .csv file with utf-8 text format and write it in a cp1252(ansi) format with pipe delimiters. The following code works in Python 3.6 but I need it to work in Python 2.6. However, the 'open' function does not allow an encoding keyword in Python 2.6.

import datetime
import csv

# Define what filenames to read
filenames = ["FILE1","FILE2"]
infilenames = [filename+".csv" for filename in filenames]
outfilenames = [filename+"_out_.csv" for filename in filenames]

# Read filenames in utf-8 and write them in cp1252
for infilename,outfilename in zip(infilenames,outfilenames):
    infile  = open(infilename, "rt",encoding="utf8")
    reader = csv.reader(infile,delimiter=',',quotechar='"',quoting=csv.QUOTE_MINIMAL)

    outfile  = open(outfilename, "wt",encoding="cp1252")
    writer = csv.writer(outfile, delimiter='|', quotechar='"', quoting=csv.QUOTE_NONE,escapechar='\\')  
    for row in reader:
        writer.writerow(row)    

infile.close()
outfile.close()

I tried several solutions:

  • Not defining encoding. Results in error on certain unicode characters
  • use io library (io.open instead of open). Results in "Type error: cannot write str to text in text stream".

Does anyone know the correct solution for this in Python 2.X?

litelite
  • 2,857
  • 4
  • 23
  • 33
Arjan Groen
  • 604
  • 8
  • 16
  • Python 2's `csv` doesn't like `unicode` strings, so there's no easy fix in the standard library. However, there are third-party solutions. Check out the answers to [this question](https://stackoverflow.com/questions/904041/reading-a-utf8-csv-file-with-python), for example. – lenz Aug 09 '17 at 21:42

1 Answers1

1

There may be some redundant code here but I got this to work by doing the following:

  • First I did the enconding using the .decode and .encode funtion to make it "cp1252".
    • Then I read the csv from the cp1252 encoded file and wrote it to a new csv

...

import datetime
import csv

# Define what filenames to read
filenames = ["FILE1","FILE2"]


infilenames = [filename+".csv" for filename in filenames]
outfilenames = [filename+"_out_.csv" for filename in filenames]
midfilenames = [filename+"_mid_.csv" for filename in filenames]

# Iterate over each file
for infilename,outfilename,midfilename in zip(infilenames,outfilenames,midfilenames):

    # Open file and read utf-8 text, then encode in cp1252
    infile  = open(infilename, "r") 
    infilet = infile.read()
    infilet = infilet.decode("utf-8")
    infilet = infilet.encode("cp1252","ignore")

    #write cp1252 encoded file
    midfile = open(midfilename,"w")
    midfile.write(infilet)
    midfile.close()

    # read csv with new cp1252 encoding
    midfile = open(midfilename,"r")
    reader = csv.reader(midfile,delimiter=',', quotechar='"',quoting=csv.QUOTE_MINIMAL)

    # define output
    outfile  = open(outfilename, "w")
    writer = csv.writer(outfile, delimiter='|', quotechar='"',quoting=csv.QUOTE_NONE,escapechar='\\')

    #write output to new csv file
    for row in reader:
        writer.writerow(row)

    print("written file",outfilename)
    infile.close()
    midfile.close()
    outfile.close()
Arjan Groen
  • 604
  • 8
  • 16