Unicode Using sqlite3 in Python 2.7.3

Question

I'm trying to insert into a table, but it seems that the file I opened has non-ascii characters in it. This is the error I got:

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

So after doing some research, I tried putting this in my code:

encode("utf8","ignore")

Which then gave me this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 9: ordinal not in range(128)

So then I tried using the codecs library and open the file like this:

codecs.open(fileName, encoding='utf-8')

which gave me this error:

newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 0: invalid start byte

Then instead of utf-8, I used utf-16 to see if that would do anything and I got this error:

raise UnicodeError,"UTF-16 stream does not start with BOM" UnicodeError: UTF-16 stream does not start with BOM

I'm all out of ideas... Also I'm using Ubuntu, if it helps.

The first `UnicodeDecodeError` is thrown because you are trying to encode bytes, which requires you to *decode* to Unicode first. — Martijn Pieters, May 13 '13 at 19:41
You can use https://pypi.python.org/pypi/chardet to guess the encoding. — Thomas Fenzl, May 13 '13 at 19:46
@MartijnPieters Do I have to turn it into a unicode string first like so: http://stackoverflow.com/a/1211102/2379053 in order to ignore the unicode characters? Is python, by default, just reading in the strings as if they are ascii? — wpakt, May 13 '13 at 19:52
@user2379053: You really want to read the [Python Unicode HOWTO](http://docs.python.org/2/howto/unicode.html); python reads byte strings; characters of with values between 0 and 255, regardless of encoding. — Martijn Pieters, May 13 '13 at 19:53
@user2379053: You can then interpret those bytes as a specific encoding by using `.decode()` to get unicode values. You shouldn't ignore anything to get there, that's like using a chainsaw to make your shiny car fit into a garage without opening the door first. — Martijn Pieters, May 13 '13 at 19:54
@user2379053: Instead, figure out the correct encoding, treat it like a key to open the garage door first. — Martijn Pieters, May 13 '13 at 19:55
@user2379053: What you linked to is going the *other way*, unicode encoding to byte strings. That's not the direction you want to go here. — Martijn Pieters, May 13 '13 at 19:56
Since stack overflow won't let me answer my own question yet... This is what happened: The problem was that the file doesn't know what encoding it is. I used: **file -bi [filename]** to find out what encoding the file is and got: **text/plain; charset=unknown-8bit**. So I went into my text editor (Sublime) to see if it would work if I saved it with encoding: utf-8. Then I ran my script (with the codecs library) using that file and it worked. Thanks for everyone's help. :) — wpakt, May 13 '13 at 20:35

Unicode Using sqlite3 in Python 2.7.3

0 Answers0