Is it possible to insert many rows into a table using one query in pyhdb?
Because when I have millions of records to insert, inserting each record in a loop is not very efficient.
Is it possible to insert many rows into a table using one query in pyhdb?
Because when I have millions of records to insert, inserting each record in a loop is not very efficient.
In pyhdb cursors support the normal executemany() statement as described in the Python DB API specs (https://www.python.org/dev/peps/pep-0249/) - this way to can do bulk inserts.
Have you tried this?
To build off of @ralhei's answer, pyhdb.cursor.executemany()
allows a SQL statement and a list of parameters. So if you have a three column table t1
, you can insert rows this way
import pyhdb
# Open connection
connection = pyhdb.connect(
host="example.com",
port=30015,
user="user",
password="secret"
)
cursor = connection.cursor()
# Write data
parameters = [
(1, 2, "3"),
(4, 5, "6"),
]
query = 'INSERT INTO schema.t1 VALUES (?, ?, ?)'
cursor.executemany(query, parameters)
The parameters must have a known length though, so e.g. a generator expression cannot be use. This might be a problem if your computer cannot hold the entire list in memory or if your parameters list is bigger than HANA can receive! In this case, you should insert in batches like this:
# Helper function
def iter_chunks(iterator, chunk_size):
"""Split iterator into chucks
https://stackoverflow.com/a/312466/6068036
"""
res = []
for item in iterator:
res.append(item)
if len(res) >= chunk_size:
yield res
res = []
if res:
yield res # yield the last, incomplete, portion
# Write data
parameters = ... # a generator expression or other Iterator
chunk_size = 1000
for chunk in iter_chunks(parameters, chunk_size):
cursor.executemany(query, chunk)
pyhdb executemany() is faster than simply execute() but for larger records even if you divide in chunks and use executemany() it still takes significant time.
For better and faster performance use string formatting like values (?, ?, ?...)
instead of values('%s', '%s', '%s', ...)
This saves a lots of time that heavy type conversion uses on server side and gets response faster and hence faster execution.