10

When inserting multiple rows in a MySQL-DB via a SQLA-Expression-Language statement, f.e.

Foo.__table__.insert().execute([{'bar': 1}, {'bar': 2}, {'bar': 3}])

it´s extremly slow, when compared to the execution of a "raw" sql statement for the same task, i.e.

engine.execute("insert into foo (bar) values (1),(2),(3)")

What is the reason for this? Can´t SQLA generate a single bulk insert statement and therefore executes multiple inserts? Due to the speed limits of the orm, i need a fast way to add several thousand rows at once, but the SQLA-Expression-Language-Version is too slow. So, do i need to write the raw sql by myself? The documentation isn't too clear about this.

I ran a speed test with the ORM insert, the ORM with preassigned PK and the SQLA bulk insert (see SQLA bulk insert speed) like this (https://gist.github.com/3341940):

  • SqlAlchemy ORM: Total time for 500 records 9.61418914795 secs
  • SqlAlchemy ORM pk given: Total time for 500 records 9.56391906738 secs
  • SqlAlchemy Core: Total time for 500 records 9.5362598896 secs
  • SQLAlchemy RAW String Execution: Total time for 500 records 1.233677 secs

As you can see, there is practically no difference between the three versions. Only the execution of a raw string insert, where all the records are included in the raw sql statement is significantly faster. Thus, for fast inserts, SQLA seems sub-optimal.

Community
  • 1
  • 1
dorvak
  • 9,219
  • 4
  • 34
  • 43
  • Is there a question here? The link you reference includes a detailed discussion by SA's creator of why the speed difference exists. – khoxsey Aug 10 '12 at 16:48
  • Actually, the linked discussion only handles the differences between the orm and a Native or raw sql Insertion. My question is, why the sqla-Core method for bulk-inserts isn' faster than the orm-method nor comparable to the RAW-SQl – dorvak Aug 10 '12 at 20:37
  • I know you referenced the other question, but it looks like you changed the script in some way (reduced the number of records). Can you post your complete version of your script? I just ran the version from the linked question and received similar results to the answerer (ORM took ~9 seconds, core took less than one second). – Mark Hildreth Aug 11 '12 at 00:26
  • Hey Mark, the only thing i changed is the db-type. Using a sqlite-db, i can exactly reproduce the results, but when i change to a mysql-db, the SA-Core-Method´s speed reduces dramatically. Maybe the bulk-insert/executemany for the mysql-dialect is corrupted, but that´s just speculations. – dorvak Aug 11 '12 at 12:16
  • @mad_scientist I still believe that it would be more helpful if you post the exact script that you yourself used. – Mark Hildreth Aug 11 '12 at 20:01
  • See the gist-link attached (or https://gist.github.com/3341940) – dorvak Aug 13 '12 at 15:37
  • I noticed that if you instead of passing all values at once to the core expression but looped it is way way slower than the ORM, like here: https://gist.github.com/Zitrax/5cbd35c1138071aab7fd (e.g. I have 0.18 s for the slowest ORM and the looped core expression takes 5.57 s ) – Zitrax Oct 29 '14 at 09:08

1 Answers1

2

It seems that the special INSERT with multiple values only became recently supported(0.8 unreleased), you can see the note at the bottom of this section regarding the difference between executemany(what execute with a list does) and a multiple-VALUES INSERT:

http://docs.sqlalchemy.org/ru/latest/core/expression_api.html#sqlalchemy.sql.expression.Insert.values

This should explain the performance difference you see. You could try installing the development version and repeating the tests with the altered calling syntax, mentioned in the link, to confirm.

Ian Wilson
  • 6,223
  • 1
  • 16
  • 24