0

I'm trying to create a script that will upload everyday some files to our Postgres DB.

I create a python script that reads all the .csv files in a given directory:

# -*- coding: utf-8 -*-

import os
import psycopg2

connect_string = "dbname='db' user='postgres' host='localhost' password='########"
conn = psycopg2.connect(connect_string)
cur = conn.cursor()

folder = u"SomePathOnTheServer\\Dépôt Chronos"

os.chdir(folder)

listFiles = os.listdir(".")

for files in listFiles:
    if files.startswith("[Prefix]"):
        if files.endswith(".csv"):
            full_path = os.path.join(folder, files)
            print full_path
            cur.execute(u"""SET client_encoding to 'latin1';
                           COPY sde.assignation_train FROM '%s' DELIMITER ';' CSV HEADER;""" %(full_path))

cur.close()

print "All good!"

The problem is that the folder has some accents: "/Dépôt Chronos". I can't change that since the folder is automatically generated (and I only have "read" access to the folder).

enter image description here

When I print my full path, it's correct with all caracters (marked in green). But it seems that the path that's passed to my "cur.execute" is not. I tried adding an u""" """ before my string to pass it as unicode, but it's not working.

Any idea what's wrong?

Thanks!!!

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
fgcarto
  • 335
  • 1
  • 2
  • 8
  • 1
    The Unicode object `u"""..."""` is probably encoded with UTF-8 by `cur.execute`, but you are telling your database that the encoding is `latin1`. – chepner Aug 19 '14 at 15:00
  • Try this: http://stackoverflow.com/questions/5523373/python-how-to-move-a-file-with-unicode-filename-to-a-unicode-folder – rajpy Aug 19 '14 at 15:01
  • In general, the Windows command prompt can be relied upon to be somewhat broken with Unicode, unless you carefully `chcp 65001` to set unicode mode. Otherwise it assumes everything is in the default 8-bit codepage. Run it in a native Windows Python interpreter, or IDLE, or something. Or `SET client_encoding` in PostgreSQL to match the terminal's codepage, so PostgreSQL re-codes input and output for you (But then *all* SQL must use the terminal's 8-bit codepage). Or trap errors and re-encode them before re-throwing. Also, the encoding is *not* latin-1, it's a Windows codepage. – Craig Ringer Aug 19 '14 at 15:08
  • chepner, the "latin1" part refers to the content of the file, no? It should have nothing to do with its name? – fgcarto Aug 19 '14 at 15:35
  • I don't understand though why my "full_path" variable is correct, but once passed as a command to "cur.execute", it's messed up? – fgcarto Aug 19 '14 at 15:47

0 Answers0