1

Due to the nature of my application, I need to support fast inserts of large volumes of data into the database. Using executemany() increases performance, but there's a caveat. For example, MySQL has a configuration parameter called max_allowed_packet, and if the total size of my insert queries exceeds its value, MySQL throws an error.

Question #1: Is there a way to tell SQLAlchemy to split the packet into several smaller ones?
Question #2: If other RDBS have similar constraints, how should I work around them as well?



P.S. I had posted this question earlier but deleted it when I wrongly assumed that likely I will not encounter this problem after all. Sadly, that's not the case.

dpq
  • 9,028
  • 10
  • 49
  • 69

1 Answers1

2

I had a similar problem recently and used the - not very elegant - work-around:

  • First I parsed my.cnf for a value for max_allow_packets, if I can't find it, the maximum is set to a default value.
  • All data items are stored in a list.
  • Next, for each data item I count the approximate byte length (with strings, it's the length of the string in bytes, for other data types I take the maximum bytes used to be safe.)
  • I add them up, committing after I have reached approx. 75% of max_allow_packets (as SQL queries will take up space as well, just to be on the safe side).

This approach is not really beautiful, but it worked flawlessly for me.

PhilS
  • 1,259
  • 7
  • 13
  • 1
    That sounds okay, although there are two nuances. First, the max_allowed_packet value may have been set by a DBA-issued command, not in the config file, so I think I'll rather do a `select @@max_allowed_packet` instead. Second, do you know if any other databases like Postgres or Oracle need similar workarounds? – dpq Jul 26 '10 at 11:22
  • You're right, the ``select`` statement is the way to do it properly. As with Postgres or Oracle, I don't have any experience with those databases, but a quick Googling yielded http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg71116.html , so I assume that at least Postgres does not have this limitation, but I may be wrong. – PhilS Jul 26 '10 at 14:52