1

I have a problem with running an SQL script in pandas with WHERE in clause which is picking members of a list or tuple

    tuple=(1,2,3,4,5,6, 7) # there are 2228 members
    date=20200101

    sql=pd.read_sql_query("""SELECT *                             
                             FROM [MY_TABLE] with (nolock)
                             WHERE [cod] IN (?)
                             and bi_partition>=?""", conn, params=[tuple, date])

The error returned:

('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The data types varchar and ntext are incompatible in the equal to operator. (402) (SQLExecDirectW); [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)')

When using params =[str(tuple), str(date)] the error is:

 ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The data types varchar and ntext are incompatible in the equal to operator. (402) (SQLExecDirectW); [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)') 

I have tried to fix it with different solutions I found on StackOverflow but it doesn't work.

Anybody can help?

Thank you

Vero
  • 479
  • 5
  • 11
  • 1
    can you try with `params=[str(tuple), str(date)]`? – PApostol Sep 09 '20 at 19:00
  • @Aviv Yaniv, I have tried but different error returned: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The data types varchar and ntext are incompatible in the equal to operator. (402) (SQLExecDirectW); [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)') – Vero Sep 09 '20 at 19:04
  • 1
    @Vero set the error you commented on. You can edit your post too :) – Aviv Yaniv Sep 09 '20 at 19:10
  • `where [cod] in ?` instead of `(?)` would perhaps fix it. – null Sep 09 '20 at 19:13
  • unfortunately, no . It gives the same error – Vero Sep 09 '20 at 19:15
  • Is `bi_partition` ntext? You can try including the parameters directly in the string and checking if the error still occurs. – Mike67 Sep 09 '20 at 20:24
  • You should avoid using thousands of values with [`IN` clause](https://stackoverflow.com/q/21178390/1422451) as performance deteriorates. How did you retrieve 2,228 members? From a different query? – Parfait Sep 09 '20 at 20:31
  • @Parfait yes, but I importem them to pandas as a list – Vero Sep 10 '20 at 11:24

1 Answers1

1

Consider avoiding the use of thousands of separate values in IN clause from a Python tuple. Since values derived from a different query, use a subquery from source of values with IN.

date = "2020-01-01"

sql = """SELECT Col1, Col2, Col3, ...
         FROM [MY_TABLE]
         WHERE bi_partition >= ?
           AND [cod] IN 
               (SELECT [cod] FROM myOtherTable)"""

df = pd.read_sql_query(sql, conn, params=[date])

Also, consider explicitly identifying columns in SELECT for code readability and maintainability where you and readers of your code can see what fields are actually being retrieved and avoid pulling in huge columns like BLOBs or meta identifiers like GUIDs that are not needed for Pandas analytics. See Why is SELECT * considered harmful?

Parfait
  • 104,375
  • 17
  • 94
  • 125
  • @ Parfait, thank you for the solution! I haven't tried it yet. I understood the idea. It makes sense! – Vero Sep 10 '20 at 17:45