I have tried terminating fields / lines using some long strings, but every time some entries are corrupted.
The BLOB is numpy.array
of integers saved as uint32
(numpy.array(intlist, dtype='uint32').tostring()
).
Upon unload (numpy.fromstring(blob, dtype='uint32')
), numpy reports string size must be a multiple of element size
error.
mysql -p somedb -e "LOAD DATA LOCAL INFILE '/tmp/tmpBZRvWK' INTO TABLE `hash2protids` FIELDS TERMINATED BY '....|....' LINES TERMINATED BY '....|....\n'"
EDIT
I've tried INSERT INTO hash2protids (hash, protids) VALUES ('key', LOAD_FILE('/tmp/tmpfile'));
(Insert file into mysql Blob), but LOAD_FILE works only with files on the server. In addition, I have 3M rows to insert, so it's way too slow...
EDIT2
For now I'm relying on MySQLdb
executemany, as it's:
- much faster than inserting single rows
- can load any datatype
Code looks something like this:
step = 1000
data = []
cmd = "INSERT INTO sometable (hash, protids) VALUES (%s, %s)"
for (hash, protids) in GENERATOR:
pblob = numpy.array(protids, dtype='uint32').tostring()
data.append((hash, pblob))
if len(data)>step:
cur.executemany(cmd, data)
data = []
if data:
cur.executemany(cmd, data)
Note, I've split cur.executemany every 1000 elements, as I was running into MySQL errors due to 'max_allowed_packet' limit, when all entries were executed at once.