2

I am trying to store some data generated by a python script in a MySQL database. Essentially I am using the commands:

con = oursql.connect(user="user", host="host", passwd="passwd", 
                     db="testdb")
c = con.cursor()                             

c.executemany(insertsimoutput, zippedsimoutput)

con.commit()
c.close()

where,

insertsimoutput = '''insert into simoutput 
                        (repnum, 
                         timepd, 
                         ...) values (?, ?, ...?)'''

About 30,000 rows are inserted and there are about 15 columns. The above takes about 7 minutes. If I use MySQLdb instead of oursql, it takes about 2 seconds. Why this huge difference? Is this supposed to be done some other way in oursql, our oursql is just plain slow? If there is a better way to insert this data with oursql, I would appreciate if you can let me know.

Thank you.

Curious2learn
  • 31,692
  • 43
  • 108
  • 125
  • Have you looked at the query that oursql is doing? – Simeon Visser Jun 15 '12 at 15:08
  • @Simeon I am not sure how to find that. Can you please explain? – Curious2learn Jun 15 '12 at 15:32
  • 1
    You could look in the [mysql server logs](https://dev.mysql.com/doc/refman/5.1/en/server-logs.html). From the time difference you're observing I'd assume that oursqls executemany does 30k single-row inserts and MySQLdb inserts multiple rows at once. – l4mpi Jun 15 '12 at 19:13

3 Answers3

7

The difference is that MySQLdb does some hackery to your query while oursql does not...

Taking this:

cursor.executemany("INSERT INTO sometable VALUES (%s, %s, %s)",
    [[1,2,3],[4,5,6],[7,8,9]])

MySQLdb translates it before running into this:

cursor.execute("INSERT INTO sometable VALUES (1,2,3),(4,5,6),(7,8,9)")

But if you do:

cursor.executemany("INSERT INTO sometable VALUES (?, ?, ?)",
    [[1,2,3],[4,5,6],[7,8,9]])

In oursql, it gets translated into something like this pseudocode:

stmt = prepare("INSERT INTO sometable VALUES (?, ?, ?)")
for params in [[1,2,3],[4,5,6],[7,8,9]]:
    stmt.execute(*params)

So if you want to emulate what mysqldb is doing but benefit from prepared statements and other goodness with oursql, you need to do this:

from itertools import chain
data = [[1,2,3],[4,5,6],[7,8,9]]
one_val = "({})".format(','.join("?" for i in data[0]))
vals_clause = ','.join(one_val for i in data)
cursor.execute("INSERT INTO sometable VALUES {}".format(vals_clause),
    chain.from_iterable(data))

I bet oursql will be faster when you do this :-)

Also, if you think its ugly, you are right. But just remember MySQL db is doing something uglier internally - its using regular expressions to parse your INSERT statement and break off the parameterized part and THEN doing what I suggested you do for oursql.

underrun
  • 6,713
  • 2
  • 41
  • 53
1

I would say to check if oursql supports a bulk insert sql command to boost performance.

notbad.jpeg
  • 3,308
  • 1
  • 32
  • 37
0

Oursql does support bulk insert statements. I've written code to do so, using the sqlalchemy wrapper.

For pure oursql, something like this should be fine:

with open('tmp.csv', 'wb') as tmp:
    for item in zippedsimoutput:
        tmp.write("{0}\n".format(item))
c.execute("""LOAD DATA LOCAL INFILE 'tmp.csv' INTO TABLE flags FIELDS TERMINATED BY ',' ENCLOSED BY '"'  LINES TERMINATED BY '\r\n' ;""")

Note that the rows must be in the same order as the columns on the database.

Spencer Rathbun
  • 14,510
  • 6
  • 54
  • 73
  • dumping to a csv and then loading doesn't look like what @Curious2learn wants ... – underrun Jun 19 '14 at 22:33
  • @underrun It's been a couple years, but IIRC this was the only way that oursql supported making a `bulk insert`. Since he was looking for a fast way to insert a large data set, demonstrating the bulk insert syntax seemed most useful. Again, IIRC, oursql *does not* support multiple insert, which is why the orm does not transform the query, but runs it many times with each item. Since there is no multi insert, the only option left is bulk insert. This is not a problem for the underlying MySQL database, but that was not the question. – Spencer Rathbun Jun 20 '14 at 12:21