2

I have tried terminating fields / lines using some long strings, but every time some entries are corrupted.
The BLOB is numpy.array of integers saved as uint32 (numpy.array(intlist, dtype='uint32').tostring()).
Upon unload (numpy.fromstring(blob, dtype='uint32')), numpy reports string size must be a multiple of element size error.

mysql -p somedb -e "LOAD DATA LOCAL INFILE '/tmp/tmpBZRvWK' INTO TABLE `hash2protids` FIELDS TERMINATED BY '....|....' LINES TERMINATED BY '....|....\n'"

EDIT
I've tried INSERT INTO hash2protids (hash, protids) VALUES ('key', LOAD_FILE('/tmp/tmpfile')); (Insert file into mysql Blob), but LOAD_FILE works only with files on the server. In addition, I have 3M rows to insert, so it's way too slow...

EDIT2
For now I'm relying on MySQLdb executemany, as it's:

  • much faster than inserting single rows
  • can load any datatype

Code looks something like this:

step = 1000
data = []
cmd = "INSERT INTO sometable (hash, protids) VALUES (%s, %s)"
for (hash, protids) in GENERATOR:
    pblob = numpy.array(protids, dtype='uint32').tostring()
    data.append((hash, pblob))
    if len(data)>step:
        cur.executemany(cmd, data)
        data = []
if data:
    cur.executemany(cmd, data)

Note, I've split cur.executemany every 1000 elements, as I was running into MySQL errors due to 'max_allowed_packet' limit, when all entries were executed at once.

Community
  • 1
  • 1
Leszek
  • 1,290
  • 2
  • 11
  • 21

0 Answers0