0

I have recently been trying to connect to my SQL database on Microsoft Azure. I am trying a newer approach that can upload multiple columns of data from a pandas dataframe. It's a new approach, because it is not something that I have done before.

I get the following error:

Traceback (most recent call last):
  File "src/pymssql/_pymssql.pyx", line 460, in pymssql._pymssql.Cursor.execute
  File "src/pymssql/_mssql.pyx", line 1084, in pymssql._mssql.MSSQLConnection.execute_query
  File "src/pymssql/_mssql.pyx", line 1115, in pymssql._mssql.MSSQLConnection.execute_query
  File "src/pymssql/_mssql.pyx", line 1248, in pymssql._mssql.MSSQLConnection.format_and_run_query
  File "src/pymssql/_mssql.pyx", line 1786, in pymssql._mssql.check_cancel_and_raise
  File "src/pymssql/_mssql.pyx", line 1832, in pymssql._mssql.raise_MSSQLDatabaseException
pymssql._mssql.MSSQLDatabaseException: (206, b'Operand type clash: int is incompatible with textDB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "sql_import.py", line 109, in <module>
    cursor.execute(sql_insert_clean[0], sql_tuple)
  File "src/pymssql/_pymssql.pyx", line 478, in pymssql._pymssql.Cursor.execute
pymssql._pymssql.OperationalError: (206, b'Operand type clash: int is incompatible with textDB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n')

Here's the script I am working with:

import pymysql
#Set up SQL server
connection = pymssql.connect("xxx.database.windows.net", "xxx", "xxx", "coiner")
cursor = connection.cursor()

""" Some processing done here that I will skip"""

--- ... ---

#get data into tuples
sql_data = tuple(map(tuple, sql_dataframe.values))

#store into SQL database
for sql_tuple in sql_data:
    print(len(sql_tuple))
    cursor.execute(sql_insert_clean[0], sql_tuple)
    connection.commit()

sql_insert_clean[0] looks like the following:

INSERT INTO StocksRealTime (change,changePercent,dayVolume,exchange,id,marketHours,price,priceHint,quoteType,time,lastSize,currency,circulatingSupply,dayHigh,dayLow,fromcurrency,lastMarket,marketcap,shortName,volAllCurrencies,vol_24hr) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)

It uploads at least 420 rows of data and on the 421 row it throws the error. I have checked the row and I cannot see what may be wrong with it.

Additionally, this process is extremely slow! it took about 1 minutes to upload 400 rows. My actual dataset is ~1m rows of data, so how would I improve the speed?

Example of my dataset around rows 439:442:

                             439            440            441            442    types
change                  5.279999       116.0298              7       39.59998   object
changePercent            2.06775       5.425476       3.660322       5.439557   object
dayVolume             54856566.0      7831598.0     40869414.0     56702940.0  float64
exchange                     NMS            NMS            NMS            NMS   object
id                          MSFT           AMZN             FB           TSLA   object
marketHours                    1              1              1              1    int64
price                     260.63        2254.64         198.24          767.6   object
priceHint                      4              4              4              4    int64
quoteType                      8              8              8              8    int64
time               3304942412000  3304942412000  3304942414000  3304942404000    int64
lastSize                   200.0            0.0            0.0            0.0  float64
currency                                                                        object
circulatingSupply            0.0            0.0            0.0            0.0  float64
dayHigh                                                                         object
dayLow                                                                          object
fromcurrency                                                                    object
lastMarket                                                                      object
marketcap                    0.0            0.0            0.0            0.0  float64
shortName                                                                       object
volAllCurrencies             0.0            0.0            0.0            0.0  float64
vol_24hr                     0.0            0.0            0.0            0.0  float64
Working dollar
  • 306
  • 2
  • 10
  • I don't think that error is coming from the insert itself, but rather something post-processing (like a computed column or a trigger). It could be caused by a number that doesn't have correct handling (e.g. 0, negative, NULL). – Aaron Bertrand May 16 '22 at 20:14
  • @AaronBertrand I will have another look as I have tried cleaning all the nulls but you may be right here as I had a similar problem before but with a different error – Working dollar May 16 '22 at 20:17
  • You can send bulk data to SQL Server as JSON and parse it there. That should be faster too. https://stackoverflow.com/questions/60745932/update-sql-server-database-using-stored-procedure-with-table-as-paramater-using/60746532#60746532 – David Browne - Microsoft May 16 '22 at 20:39

1 Answers1

0

It took a very long time of searching multiple method to get this to work quickly and effectively.

I had faced multiple errors, a few examples are the following:

  1. COUNT field incorrect or syntax error (0) (SQLExecDirectW)').

  2. Can't open lib 'MySQL' : file not found (0) (SQLDriverConnect)").

  3. (2013, 'Lost connection to MySQL server during query ([Errno 54] Connection reset by peer)')

The first one was caused by having the parameter method=multi in pd.DataFrame.to_sql while also have fast_executemany = True in the SQLAlchemy engine. Therefore, remove the method parameter worked.

The second error was caused by having 'mysql+pyodbc://' instead of 'mssql+pyodbc://'.

The third was likely the same issue as the second but I had moved to an alternative solution so I did not check it out.

What worked for me is the following:


engine = create_engine(
    'mssql+pyodbc://' + 
    user + ':' + 
    passw + '@' + 
    host + ':' + 
    str(port) + '/' + 
    database +f'?driver=ODBC+Driver+18+for+SQL+Server' , 
    echo=False,
    connect_args={"timeout":30},
                       pool_pre_ping=True, fast_executemany = True)

--- EXTRA ARGUMENTS HERE ---

conn = engine.connect()
sql_dataframe.to_sql('StocksRealTime', con=engine,index=False, if_exists='replace')

It was also many times faster than the opted choice in the post and without any errors.

Working dollar
  • 306
  • 2
  • 10