I am working with an existing SQLite database and experiencing errors due to the data being encoded in CP-1252, when Python is expecting it to be UTF-8.
>>> import sqlite3
>>> conn = sqlite3.connect('dnd.sqlite')
>>> curs = conn.cursor()
>>> result = curs.execute("SELECT * FROM dnd_characterclass WHERE id=802")
Traceback (most recent call last):
File "<input>", line 1, in <module>
OperationalError: Could not decode to UTF-8 column 'short_description_html'
with text ' <p>Over a dozen deities have worshipers who are paladins,
promoting law and good across Faer�n, but it is the Weave itself that
The offending character is \0xfb
which decodes to û
. Other offending texts include “?nd and slay illithids.”
which uses "smart quotes" \0x93
and \0x94
.
SQLite, python, unicode, and non-utf data details how this problem can be solved when using sqlite3
on its own.
However, I am using SQLAlchemy. How can I deal with CP-1252 encoded data in an SQLite database, when I am using SQLAlchemy?
Edit:
This would also apply for any other funny encodings in an SQLite TEXT
field, like latin-1
, cp437
, and so on.