7

I am trying to write the data frame into the SQL Server Table. My code:

conn = pymssql.connect(host="Dev02", database="DEVDb")
cur = conn.cursor()
query = "INSERT INTO dbo.SCORE_TABLE VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"
cur.executemany(query, df_sql)
conn.commit()
cur.close()
conn.close()

The dimension of the df_sql is (5860, 20) i.e. the number of columns in the data frame is same as the number of columns in the SQL Server Table. Still I am getting following error:

ValueError: more placeholders in sql than params available

UPDATED BELOW

As per one of the comments, I tried using turbodbc as below:

conn = turbodbc.connect(driver="{SQL Server}", server="Dev02", Database="DEVDb")
conn.use_async_io = True
cur = conn.cursor()
query = "INSERT INTO dbo.STG_CONTACTABILITY_SCORE VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
cur.executemany(query, df_sql.values)
cur.commit()
cur.close()
conn.close()

I am getting following error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I don't get it. What is wrong here. I see df_sql.values and I don't find anything wrong.

The first row of ndarray is as below:

[nan 'DUSTIN HOPKINS' 'SOUTHEAST MISSOURI STATE UNIVERSITY' 13.0
  '5736512217' None None 'Monday' '8:00AM' '9:00AM' 'Summer' None None None
  None '2017-12-22 10:39:30.626331' 'Completed' None '1-11KUFFZ'
  'Central Time Zone']
Krishnang K Dalal
  • 2,322
  • 9
  • 34
  • 55

4 Answers4

7

I think you just need to specify each column name and don't forget the table must have the id field to charge the data frame index:

conn = pymssql.connect(host="Dev02", database="DEVDb")
cur = conn.cursor()
query = """INSERT INTO dbo.SCORE_TABLE(index, column1, column2, ..., column20)
            VALUES (?, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, 
            %s, %s, %s, %s, %s, %s)"""
cur.executemany(query, df_sql)
conn.commit()
cur.close()
conn.close()

Ok I have been using pandas and I exported the last data frame to csv like:

df.to_csv('new_file_name.csv', sep=',', encoding='utf-8')

Then I just used pyobdc and BULK INSERT Transact-SQL like:

import pyodbc

conn = pyodbc.connect(DRIVER='{SQL Server}', Server='server_name', Database='Database_name', trusted_connection='yes')
cur = conn.cursor()

cur.execute("""BULK INSERT table_name
               FROM 'C:\\Users\\folders path\\new_file_name.csv'
               WITH
               (
                   CODEPAGE = 'ACP',
                   FIRSTROW = 2,
                   FIELDTERMINATOR = ',',
                   ROWTERMINATOR = '\n'
               )""")
conn.commit()

cur.close()
conn.close()

It was a second to charge 15314 rows into SQL Server. I hope this gives you an idea.

virtualdvid
  • 2,323
  • 3
  • 14
  • 32
  • Thank you @virtualdvid for your help and sorry for the late acknowledgment. This works. – Krishnang K Dalal Jan 02 '18 at 14:53
  • @virtualvid: I am trying to use your BULK INSERT suggestion for one of my projects but I keep getting **Operating system could not locate the path" of new_file_name.csv. The file is in the path that I am specifying. Can you think of what could be the issue? – Krishnang K Dalal Feb 13 '18 at 20:02
  • @KrishnangKDalal Can you show me the line that you are using? – virtualdvid Feb 13 '18 at 20:06
  • I have posted on https://stackoverflow.com/questions/48774516/bulk-insert-into-sql-server-table-using-pyodbc – Krishnang K Dalal Feb 13 '18 at 20:25
  • How would this look if data had to go in schema 'myschema' instead of into 'dbo'? All my tables have a prefix 'ms_' but I would like them to be 'myschema.'ms_'.. However, if I replace 'ms_' with 'myschema.ms_' all tables look like this: 'dbo.myschema.ms_' ... – Henrov Mar 16 '20 at 10:13
1

Possibly executemany treats each row in the ndarray from your df.values call as one item since there are no comma separators between values. Hence, the placeholders outnumber actual binded values and you receive the mismatch error.

Consider converting array to a tuple of tuples (or lists of lists/tuple of lists/list of tuples) and then pass that object into executemany:

query = "INTO dbo.SCORE_TABLE VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"    
sql_data = tuple(map(tuple, df.values))

cur.executemany(query, sql_data)
cur.commit()
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Nop, I tried converting an array to a tuple of tuples and list of lists but getting `turbodbc.exceptions.DatabaseError: Invalid number of parameters (expected 0, got 20)` error – Krishnang K Dalal Dec 22 '17 at 17:39
  • This was for pymssql as placeholders are different. Try with `?` in turbodbc. – Parfait Dec 22 '17 at 17:40
  • I think it is working but my IT team has added an identity column to which I am not able to insert any values by turning IDENTITY_INSERT OFF due to permission issues and I am not able to proceed without providing any values for that column. I will accept the answer once I have it sorted out from my end. Thank you – Krishnang K Dalal Dec 22 '17 at 18:17
  • If identity field is same name as column in dataframe, drop that column in pandas and remove one placeholder then try again. Identity columns are autonumber fields that the database engine auto-inserts with new rows and not user. – Parfait Dec 22 '17 at 18:46
  • Yes, I tried that and I am getting this error: `turbodbc.exceptions.DatabaseError: ODBC error state: IM002 native error code: 0 message: [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified`. I definitely have the table in my database since I am able to insert data using `to_sql` – Krishnang K Dalal Dec 22 '17 at 19:31
  • That looks like a connection error not query execution. Comment out everything and see if `connect()` line works. – Parfait Dec 22 '17 at 19:47
  • You were right. It was a connection error and I fixed that but Iam still getting this error: `turbodbc.exceptions.DatabaseError: ODBC error state: 42S02 native error code: 208 message: [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'dbo.STG_CONTACTABILITY_SCORE'.` – Krishnang K Dalal Dec 22 '17 at 20:26
  • Are you specifying a database in connection? If not default might not be your database. Try qualifying with dbname: `DEVDb.dbo.STG_CONTACTABILITY_SCORE`. Otherwise, I would re-try with pymssql. Until now, I never heard of turbodbc. – Parfait Dec 22 '17 at 20:33
  • I got it working. Thanks a lot for all bearing with me. Your help was very helpful. Cheers and happy new year. – Krishnang K Dalal Jan 02 '18 at 14:55
1

This works for me-

insert_query = """INSERT INTO dbo.temptable(CHECK_TIME, DEVICE, METRIC, VALUE, TOWER, LOCATION, ANOMALY, ANOMALY_SCORE, ANOMALY_SEVERITY)
            VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)"""
write_data = tuple(map(tuple, data_frame.values))
cursor.executemany(insert_query, write_data)
con.commit()
cursor.close()
con.close()
Palash Mondal
  • 468
  • 4
  • 10
0

If i understand correctly you want to use DataFrame.to_sql() method:

df_sql.to_sql('dbo.SCORE_TABLE', conn, index=False, if_exists='append')
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Yes, I'm aware of that but I have read that pandas `to_sql` is slower than `pymssql` for a large amount of data. Do you think there is a major performance difference between two methods when dealing with a large amount of data? By large amount, I mean ~80k to 100k rows and 20 columns – Krishnang K Dalal Dec 21 '17 at 19:39
  • @KrishnangKDalal, consider using [turbodbc](http://turbodbc.readthedocs.io/en/latest/pages/advanced_usage.html#advanced-usage-numpy-parameters) – MaxU - stand with Ukraine Dec 22 '17 at 08:04
  • see the updated question. `turbodbc` is really a nice package but I am still getting an error. – Krishnang K Dalal Dec 22 '17 at 15:43
  • @KrishnangKDalal ... did you actually try the `to_sql` method and test yourself, and even with an SQLAlchemy engine instead of a raw connection? What you hear may not always be true! – Parfait Dec 22 '17 at 16:14
  • @Parfait: yes, I have tested `to_sql` and it takes little more than 7 mins to insert 26646 rows. I want to compare other methods but for that, I have to get other methods working. – Krishnang K Dalal Dec 22 '17 at 17:03