0

Is it possible to insert many rows into a table using one query in pyhdb?

Because when I have millions of records to insert, inserting each record in a loop is not very efficient.

Sandra Rossi
  • 11,934
  • 5
  • 22
  • 48
Sunil Kumar
  • 535
  • 6
  • 13

3 Answers3

2

In pyhdb cursors support the normal executemany() statement as described in the Python DB API specs (https://www.python.org/dev/peps/pep-0249/) - this way to can do bulk inserts.

Have you tried this?

ralhei
  • 756
  • 6
  • 5
2

To build off of @ralhei's answer, pyhdb.cursor.executemany() allows a SQL statement and a list of parameters. So if you have a three column table t1, you can insert rows this way

import pyhdb

# Open connection
connection = pyhdb.connect(
    host="example.com",
    port=30015,
    user="user",
    password="secret"
)
cursor = connection.cursor()

# Write data
parameters = [
  (1, 2, "3"),
  (4, 5, "6"),
]
query = 'INSERT INTO schema.t1 VALUES (?, ?, ?)'
cursor.executemany(query, parameters)

The parameters must have a known length though, so e.g. a generator expression cannot be use. This might be a problem if your computer cannot hold the entire list in memory or if your parameters list is bigger than HANA can receive! In this case, you should insert in batches like this:

# Helper function
def iter_chunks(iterator, chunk_size):
    """Split iterator into chucks

    https://stackoverflow.com/a/312466/6068036
    """
    res = []
    for item in iterator:
        res.append(item)
        if len(res) >= chunk_size:
            yield res
            res = []
    if res:
        yield res  # yield the last, incomplete, portion

# Write data
parameters = ...  # a generator expression or other Iterator
chunk_size = 1000
for chunk in iter_chunks(parameters, chunk_size):
  cursor.executemany(query, chunk)
AlexM
  • 1,020
  • 2
  • 17
  • 35
0

pyhdb executemany() is faster than simply execute() but for larger records even if you divide in chunks and use executemany() it still takes significant time.

For better and faster performance use string formatting like values (?, ?, ?...) instead of values('%s', '%s', '%s', ...)

This saves a lots of time that heavy type conversion uses on server side and gets response faster and hence faster execution.