While Josh's answer here gave me a good head start on how to insert a 256x64x250 value array into a MySQL database. When I actually tried his INSERT statement on my data it turned out horribly slow (as in 6 minutes for a 16Mb file).
ny, nx, nz = np.shape(data)
query = """INSERT INTO `data` (frame, sensor_row, sensor_col, value) VALUES (%s, %s, %s, %s)"""
for frames in range(nz):
for rows in range(ny):
for cols in range(nx):
cursor.execute(query, (frames, rows, cols, data[rows,cols,frames]))
I was reading MySQL for Python, which explained that this wasn't the right approach because executing 4 million separate inserts is very inefficient.
Now my data consist out of a lot of zeros (more than 90% actually), so I threw in an IF statement so I only insert values greater than zero and I used executemany() instead:
query = """INSERT INTO `data` (frame, sensor_row, sensor_col, value) VALUES (%s, %s, %s, %s ) """
values = []
for frames in range(nz):
for rows in range(ny):
for cols in range(nx):
if data[rows,cols,frames] > 0.0:
values.append((frames, rows, cols, data[rows,cols,frames]))
cur.executemany(query, values)
This miraculously brought my processing time down to about 20 seconds, of which 14 seconds spend on creating the list of values (37k rows) and 4 seconds on the actual inserting into the database.
So now I'm wondering, how can I speed up this process any further? Because I have a feeling my loop is horribly inefficient and there has to be a better way. If I need to insert 30 measurements per dog, this would still take 10 minutes, which seems far too long for this amount of data.
Here are two versions of my raw files: with headers or without headers. I'd love to try the LOAD DATA INFILE, but I can't figure out how to parse the data correctly.