3

Python novice here.

I am using python2.7.2 on Windows7.

I have installed the PyWin32 extensions (build 217).

I have adopdbapi installed in c:\Python27\Lib\site-packages\adodbapi

I have a very simple module that queries the AdventureWorks2008LT database in MS SQL Server.

import adodbapi

connStr='Provider=SQLOLEDB.1;' \
    'Integrated Security=SSPI;' \
    'Persist Security Info=False;' \
    'Initial Catalog=AVWKS2008LT;' \
    'Data Source=.\\SQLEXPRESS'

conn = adodbapi.connect(connStr)

tablename = "[salesLT].[Customer]"

# create a cursor
cur = conn.cursor()

# extract all the data
sql = "select * from %s" % tablename
cur.execute(sql)

# show the result
result = cur.fetchall()
for item in result:
    print item

# close the cursor and connection
cur.close()
conn.close()

The AdventureWorks2008LT sample database has customer, product, address, and order tables (etc). Some of the string data in these tables is unicode.

The query works, for the first couple rows. I see the expected output. But then, the script fails with this message:

Traceback (most recent call last):
  File "C:\dev\python\query-1.py", line 24, in <module>
    print item
  File "C:\Python27\lib\site-packages\adodbapi\adodbapi.py", line 651, in __str__
    return str(tuple([str(self._getValue(i)) for i in range(len(self.rows.converters))]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 19: ordinal not in range(128)

...which is very much not helpful. To me.

I gather that adodbapi is trying to encode a u'\xe9' character into ASCII. I understand why that will fail. I suppose it's trying to do that as part of the print statement.

Why is it trying to encode the character into ASCII?
How can I tell it to just use UTF-8?

ps: I am running the script from the cmd.exe prompt in Windows. Does this mean stdout is always ASCII?

eg, \python27\python.exe -c "import sys; print(sys.stdout.encoding)"

gives me 'cp437'

Cheeso
  • 189,189
  • 101
  • 473
  • 713
  • I don't have your answer, but +1 just for the name. :) – Elliot Bonneville Mar 17 '12 at 03:53
  • If adodbapi doesn't have a feature toggle for this, you're going to have to edit/monkey-patch it. The fault lies, as the stack trace shows, on line 651 of adodbapi.py, where they attempt to call `str` on a Unicode string... – Borealid Mar 17 '12 at 03:55
  • @Borealid - I was thinking it might be because I am running the python script from the `cmd.exe` window. Which may not be unicode-capable. – Cheeso Mar 17 '12 at 03:57
  • @Cheeso You can set the system locate to a UTF-8 one, but that won't fix the fact that in Python 2.7, strings are one-byte-per-character. You have to use a unicode object to correctly store any characters above 256, and what a character from 128-256 represents is encoding dependent. So, at best, you'd get data corruption... – Borealid Mar 17 '12 at 08:51

2 Answers2

1

I was able to get the script to run to completion, printing all retrieved rows, by modifying the output portion to do this:

# show the result
result = cur.fetchall()
for item in result:
    print repr(item)

instead of this:

# show the result
result = cur.fetchall()
for item in result:
    print item

So the problem is in fact the use of str within adodbapi as Borealid said in a comment. But that is not necessarily a blocking problem. Normally when retrieving rows from a database query, people don't simply want a string representation of a row; they want to retrieve the values in the individual columns. My conclusion is that this problem is sort of an artificial problem, due to the way I was building a test app.

Cheeso
  • 189,189
  • 101
  • 473
  • 713
0

How can I tell it to just use UTF-8?

chcp 65001
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358