0

I have a problem inserting values to a MSSQL database. The string comes from a tuplified Pandas Dataframe and shall be inserted via sqlalchemy/pyodbc. Since the Panda Dataframe "to_sql" method is to slow I wanted to insert values in a batch of 1000 and built the following method (in accordance to a tutorial found on the web):

def chunker(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq), size))

def mssql_insert_rows(records,db_extract, target_tbl):
insert_ = "INSERT INTO dbo."+ db_extract+ "_"+ target_tbl+ " VALUES "
for batch in chunker (records, 1000):

    rows = ','.join(batch)
    insert_rows = insert_+rows
    cursorSQL.execute(insert_rows)
    connSQL.commit()

This is how the string looks like:

(\'2\', \'10\', \'41\', \'null\', \'null\', \'null\', \'None\', \'FR\', "Saint-ouen l\'aumône ", \'2013_07M\', \'False\', \'True\', \'113706\', \'1990-01-01 00:00:00.0\', \'2013-04-30 00:00:00.0\', \'9AAE305417\', \'null\', \'None\', \'False\', \'False\')

As cou can see the value "Saint-ouen l\'aumône " is quite bad for inserting since the program now thinks it's the name of a column. I tried different approaches like df.replace or str.replace to solve this problem but nothing really gets me to a solution.

Does anyone here know a good and easy way to convert values like this one? I personally wouldn't mind if the value looks like

Saint-ouen laumone

or so.

Thanks in advance! :-)

Aquen
  • 267
  • 1
  • 2
  • 16
  • You should not use string formatting to pass values to queries, for reasons you've encountered. Use placeholders in general and if `records` is a `DataFrame`, then the `to_sql()` method. – Ilja Everilä Oct 08 '18 at 09:10
  • Yeah normally I would agree with you but since the to_sql() method is quite slow I thought of inserting 1000 rows at once. And for that I need to build a string. – Aquen Oct 08 '18 at 10:41
  • 1
    The slowness is a result of many things, especially when using MS SQL Server with PyODBC. See ["Speeding up pandas.DataFrame.to_sql with fast_executemany of pyODBC"](https://stackoverflow.com/questions/48006551/speeding-up-pandas-dataframe-to-sql-with-fast-executemany-of-pyodbc). Depending on Pandas version it might actually even be doing pretty much what you're trying to do, but with placeholders. The linked Q/A is long, but worth the read, if you want to understand what is going on. – Ilja Everilä Oct 08 '18 at 10:43
  • Hi, I already saw this post and tried out the executemany eventlistener but its not bringing some performance boost. Or did you mean the comment you mentioned there? – Aquen Oct 09 '18 at 11:38
  • 1
    If you are using Pandas 0.23.0 or >=0.24.0, chances are that it is not using the `executemany` strategy, but the very same multi-values insert (with placeholders) that you're trying to build manually. If so, it is no surprise that fastexecutemany does not help. – Ilja Everilä Oct 09 '18 at 13:11
  • Yes I'm using 0.23.4 so it then could really be an issue. Seems like I really have to use the combination of csv and bcp... :-/ – Aquen Oct 10 '18 at 05:25

0 Answers0