This is going to be a duplicate of store numpy array in mysql . The reason is that the actual question is not answered there, and 5 years later I (another random person) am struggling with the same problem.
Goal
I have a lot of numpy vectors. I want to store each of them in a BLOB column in a MySQL database. I have already read that this is likely not a good idea, but I'd like to do it anyway. For this, I am trying to create a query. The code looks something like:
query = 'INSERT INTO vectors_table (word_id, vector) VALUES '
for i in vectors:
query += ({}, {}).format(i.index, i.vector.dumps())
(notice the call to .dumps()
)
Alternatives
Numpy arrays have a few options of functions to "serialize" its data:
.dumps()
gives me abytes
"string";.tostring()
also gives me abytes
"string", but apparently has problems
I could also use Pickle. There is a function called pickle.dumps
that also gives me a bytes
sequence.
Problem
When I try creating the query, it inserts that goddamn b
before the string. E.g., if I print one of my vectors, it looks like:
b'\x80\x02cnumpy.core.multiarray\n_reconstruct\nq\x00cnumpy\nndarray\nq\x01K\x00\x85q\x02c_codecs\nencode\nq\x03X\x01\x00\x00\x00bq\x04X\x06\x00\x00\x00latin1q\x05\x86q\x06Rq\x07\x87q\x08Rq\t(K\x01M\x00\x0c\x85q\ncnumpy\ndtype\nq\x0bX\x02\x00\x00\x00f4q\x0cK\x00K\x01\x87q\rRq\x0e(K\x03X\x01\x00\ ...
Unfortunately, as can be seen in the "Alternatives" above, this is true for all alternatives I have found so far.
Attempted solutions
In trying to solve the problem, I find lots of people telling me to use decode
, or to cast the bytes into a string.
Calling either .decode('ascii')
or .decode('utf-8')
has led to UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
and UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
, respectively.
Trying to convert the bytes
into a string creates a string that includes the "b".
Is there any better way of doing this?
I am noob with MySQL and totally believe that I am missing some obvious solution. I am getting to the point where I'm planning to just create a string with
for i in vectors:
old_str = str(i.vector.dumps())
new_str = old_str[1:]
query += ({}, {}).format(i.index, new_str)
And just insert that into my query string. Stil... I can't believe that this is the only way to get this to work.