0

So I need to insert about 700k items into a table. I am using pool. If I insert 1000 at the time, a lot of data becomes distorted in my db and it won't insert all data. And if I do 100 at the time, it does not insert the entire list for some reason. It only adds a lower amount that is divisible by 100 (so it does not execute the last line of code in the second part). Is there maybe a better way to do this?

def initialise(**kwargs):
    con_pool = pool.SimpleConnectionPool(1, 100, **kwargs)
    return con_pool


def get_connection(con_pool):
    return con_pool.getconn()

@contextmanager
def get_cursor(con_pool):
    try:
        con = get_connection(con_pool)
        yield con.cursor()
    except Exception as e:
        logger_local.warning(f'context manager for sql_funcs error: {e}')
        print(e)
    finally:
        con.commit()
        con_pool.putconn(con)


def insert_data_bulk(con_pool, insert, data, counter=0):
    try:
        with get_cursor(con_pool) as cursor:
            execute_batch(cursor, insert, data)
            counter += 1
            return counter
    except Exception as e:
        print('insert_data_bulk error: ', e)
        return counter

And this executes them in another module, basically clearing data_bulk every 100 items (since it does not work if I throw in the whole thing at once). I tried both execute_batch and execute_values, and they both are faulty.

data_bulk = []
for item in txt_data_lines:    
    parse data here and append to data_bulk
    if len(data_bulk) == 100:
        sql_funcs.insert_data_bulk(index_pool, insert, data_bulk)
        data_bulk.clear()          
sql_funcs.insert_data_bulk(index_pool, insert, data_bulk) 

to add, the get_cursor function actually gives error "not all arguments converted during string formatting" when I use execute_values

TheProgramMAN123
  • 487
  • 1
  • 5
  • 13
  • You can use `psql` to insert data from a file. It will be faster. you do not consider such solution? – Danila Ganchar Dec 16 '19 at 13:15
  • Could you include the "import" statements needed to make this work? It is not obvious to me what libraries you are even using here. And version of python. – jjanes Dec 18 '19 at 18:42
  • Also, check out this answer on tips for inserting bulk data efficiently. https://stackoverflow.com/a/39034789/4188138 – Shmuel Kamensky Apr 05 '20 at 22:22

0 Answers0