1

I am trying to connect to the sql server database from python 4 times in a short span of time to do some data pulls and it fails on the 3rd pull.

It is interesting that other scripts that have just 2 queries for data pulls work just fine. But in this particular case , there are 4 and it fails as soon as it reaches the 3rd query. I am also trying to use connection pooling and specify a pool size.

import pandas as pd
import pyodbc
import sqlalchemy
from pandas import Series,DataFrame
import numpy as np
import itertools
cnx=sqlalchemy.create_engine("mssql+pyodbc://username:password@10.0.0.1:1433/Basis?driver=/opt/microsoft/sqlncli/lib64/libsqlncli-11.0.so.1790.0",pool_size=4, max_overflow=0)

def get_data():
    sql1 = "select   Send_Customer from Pretty_Txns with (noLock)  where Send_Date >='2017-01-01' and Send_Date <= '2017-01-02'"
    sql2 = "select   Pay_Customer from Pretty_Txns with (noLock)  where Pay_Date >='2017-01-01' and Pay_Date <= '2017-01-02'"
    df1 = pd.read_sql(sql1,cnx)
    df2 = pd.read_sql(sql2,cnx)
    l1 = df1.values.tolist()
    l2 = df2.values.tolist()
    l1 = [item for sublist in l1 for item in sublist]
    l2 = [item for sublist in l2 for item in sublist]
    Send_list =  [l1[i:i + 2000] for i in xrange(0, len(l1),2000)]
    Pay_list =  [l2[i:i + 2000] for i in xrange(0, len(l2),2000)]
    for list1,list2 in itertools.izip_longest(Send_list,Pay_list):
        placeholders1 = ','.join('?' for i in range(len(list1)))
        placeholders2 = ','.join('?' for i in range(len(list2)))
        sql1="select  * from Pretty_Txns  with (noLock) where Send_Date >='2017-01-01' and Send_Date <= '2017-01-02' and Send_Customer in (%s)" % placeholders1
        sql2="select  * from Pretty_Txns with (noLock)  where Pay_Date >='2017-01-01' and Pay_Date <= '2017-01-02' and Pay_Customer in (%s)" % placeholders2
        df_send = pd.read_sql(sql1,cnx,params=list1)
        df_pay = pd.read_sql(sql2,cnx,params=list2)
        df_send['time_diff'] = pd.to_numeric(df_send['Pay_Time']-df_send['Send_Time'])
        df_send['time_diff'] = df_send['time_diff']/1000000000
        df_pay['time_diff'] = pd.to_numeric(df_pay['Pay_Time']-df_pay['Send_Time'])
        df_pay['time_diff'] = df_pay['time_diff']/1000000000
        return  {"df_send":df_send,"df_pay":df_pay}

So at the 3rd query it fails with the below error:

      sql1="select  * from Pretty_Txns  with (noLock) where Send_Date >='2017-01-01' and Send_Date <= '2017-01-02' and Send_Customer in (%s)" % placeholders1
      DBAPIError: (pyodbc.Error) ('08S01', '[08S01] [Microsoft][SQL Server Native Client 11.0]TCP Provider: Error code 0x68 (104)

So, looking forward to some tips and pointers.

optimus_prime
  • 817
  • 2
  • 12
  • 34
  • 1
    [Searching for the exact error string](https://www.google.com/search?q=TCP+Provider%3A+Error+code+0x68+(104)) shows an [answer from 2013](http://stackoverflow.com/a/15483275/223424) that seemingly contains a fix. – 9000 Feb 22 '17 at 17:48
  • 1
    Also, exposing your login credentials in the connect string for the whole internet is likely a bad idea. Frankly, having your credentials hardcoded anywhere in your source files is a bad idea. (I've edited out the credentials.) – 9000 Feb 22 '17 at 17:49
  • sorry...in the rush of the moment , forgot to hide the credentials. That post seems to be for a unixodbc driver and not pyodbc or sqlalchemy..I think this has got to do more with sqlalchemy. – optimus_prime Feb 22 '17 at 17:55
  • I'm not sure. It looks like a connection-level error. Try getting a connection object from your SQLAlchemy session and run a trivial query over it. E.g. `cnx.connect().connection.cursor().execute("select 1")`. The cursor in question is the raw pyodbc cursor. – 9000 Feb 22 '17 at 18:04
  • The cursor gets created just fine. And it is a pyodbc (connection level error) for sure. But it seems to be because of the number of connections.So , some trick about connection pooling in sqlalchemy may be useful. – optimus_prime Feb 22 '17 at 18:08

0 Answers0