92

Using SQLite3 in Python, I am trying to store a compressed version of a snippet of UTF-8 HTML code.

Code looks like this:

...
c = connection.cursor()
c.execute('create table blah (cid integer primary key,html blob)')
...
c.execute('insert or ignore into blah values (?, ?)',(cid, zlib.compress(html)))

At which point at get the error:

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

If I use 'text' rather than 'blob' and don't compress the HTML snippet, it works all fine (db is to large though). When I use 'blob' and compress via Python zlib library, I get the above error message. I looked around but couldn't find a simple answer for this one.

R. Hill
  • 3,582
  • 1
  • 20
  • 19

5 Answers5

96

If you want to use 8-bit strings instead of unicode string in sqlite3, set appropriate text_factory for sqlite connection:

connection = sqlite3.connect(...)
connection.text_factory = str
rmtheis
  • 5,992
  • 12
  • 61
  • 78
zag
  • 3,379
  • 1
  • 21
  • 19
  • 7
    This may give you problems with different encodings, since you're still trying to parse binary data as text. It's best to use sqlite3.Binary instead. – MarioVilas Jul 09 '14 at 14:02
35

In order to work with the BLOB type, you must first convert your zlib compressed string into binary data - otherwise sqlite will try to process it as a text string. This is done with sqlite3.Binary(). For example:

c.execute('insert or ignore into blah values (?, ?)',(cid, 
sqlite3.Binary(zlib.compress(html))))
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
MarioVilas
  • 912
  • 10
  • 16
  • this works. However, I was wondering why this is needed. Did the type "BLOB" already indicate the data in this column is binary? Note in Python 2 the string can be either text or binary. Shouldn't sqlite3 just treat the object (zlib compressed string) as binary for BLOB type? – user1783732 Nov 25 '14 at 19:20
  • i don't think Python has the entire database schema in memory to consult the correct data types - most likely it just guesses the types on runtime based on what you pass it, so a binary string can't be differenciated from a text string. – MarioVilas Dec 25 '14 at 11:46
  • Because SQLite uses dynamic type: https://www.sqlite.org/datatype3.html @user1783732 – Lester Cheung Feb 02 '16 at 05:15
35

Found the solution, I should have spent just a little more time searching.

Solution is to 'cast' the value as a Python 'buffer', like so:

c.execute('insert or ignore into blah values (?, ?)',(cid, buffer(zlib.compress(html))))

Hopefully this will help somebody else.

R. Hill
  • 3,582
  • 1
  • 20
  • 19
  • 1
    When I did this, my database was full of base36 text, which would make the database larger than storing the blob directly. – Brian Minton Mar 06 '14 at 19:04
  • 3
    This is incorrect, you should use sqlite3.Binary instead as the documentation says. – MarioVilas Jul 09 '14 at 14:01
  • It looks like sqlite3.Binary() is simply an alias of buffer(), at least as of https://github.com/ghaering/pysqlite/blob/master/lib/dbapi2.py#L54 – stevegt Aug 17 '16 at 14:31
  • Huh. And it also looks like this section of the pysqlite docs actually encourage the buffer() usage: "The following Python types can thus be sent to SQLite without any problem: ... "[Python type] buffer ... [SQLite type] BLOB" https://docs.python.org/2/library/sqlite3.html#introduction – stevegt Aug 17 '16 at 14:38
1

Syntax:

5 types of possible storage: NULL, INTEGER, TEXT, REAL and BLOB

BLOB is generally used to store pickled models or dill pickled models

> cur.execute('''INSERT INTO Tablename(Col1, Col2, Col3, Col4) VALUES(?,?,?,?)''', 
                                      [TextValue, Real_Value, Buffer(model), sqlite3.Binary(model2)])
> conn.commit()

> # Read Data:
> df = pd.read_sql('SELECT * FROM Model, con=conn) 
> model1 = str(df['Col3'].values[0]))
> model2 = str(df['Col'].values[0]))
Pranzell
  • 2,275
  • 16
  • 21
0

You could store the value using repr(html) instead of the raw output and then use eval(html) when retrieving the value for use.

c.execute('insert or ignore into blah values (?, ?)',(1, repr(zlib.compress(html))))
zwalker
  • 346
  • 5
  • 9
  • 1
    Using eval and repr like this is very dirty. No matter how much you trust a data-source. – Jason Fried Jan 27 '14 at 18:50
  • I agree, anything is better than eval() here. The right solution is using sqlite3.Binary, but if you can't for some reason, it's better to encode the data in a safer way - for example with base64. – MarioVilas Jul 09 '14 at 14:04