0

I am trying to run a sql query from python that establishes a connection to TeraData. In the query, I want to pass a column from a dataframe under the WHERE clause (this column has some IDs, against which I am trying to find the record).

I can successfully query the result if I hard code the column values inside the WHERE clause. But I do not want this as the values can be over 1000.

I would like to pass this column into the WHERE clause.

This is my code which is working file if I hardcode the values in the WHERE clause

# Import libraries 
import pyodbc 
import pandas as pd


# Set variables 
teradata_driver = 'Teradata Database ODBC Driver 16.20'
teradata_host_name = 'XXXX-xxx-xxx.xx.xxx.xxx'
teradata_user = 'XXXXXX'
teradata_password = 'xxxxxx'

# enter the connection type
teradata_authentication = 'ODBC'

query= """
select * from <Table_Name> as table1
where table1.ProductID IN ('15800','15801','15802','15803','15804','15805','15806','15807')
and table1.ProductDate = date '2023-04-30'
and table1.prod_TYPE = 'LE'
"""
cnxn = pyodbc.connect("DRIVER={%s};DBCNAME=%s;UID=%s;PWD=%s;authentication=%s"\
                      %(teradata_driver,teradata_host_name,teradata_user,teradata_password,teradata_authentication))

df_teradata_result = pd.read_sql(query,cnxn)

I have a DataFrame which has a column called product_id (product id can be more than 1000s). How can I pass product_id inside the WHERE clause above. Thanks in advance.

I have tried the option provided here but it is not working for me Passing a Dataframe column in where clause of a SQL (Column is of type string)

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
SSIND
  • 1
  • 1
  • I believe the only two paths forward are 1) dynamically building the IN list in the SQL you are writing [something like this](https://stackoverflow.com/a/29749286/2221001) although, I can't help but feel there is a cleaner approach that closes up the sql injection hole this introduces. That being said, I'm not seeing an in-list parameter binding approach that doesn't introduce that hole. 2) the option would be to shove your column into a Teradata volatile table and then join to it in your final sql statement. This would keep from sql injection, but involves a couple of extra steps. – JNevill Jun 27 '23 at 21:59
  • Thanks @JNevill for guidance. I have just started working with Python and not sure what do you mean sql injection hole or by in-list parameter binding approach. However, I will try the above mentioned approach (in the url) tonight and let you know. Thanks for your prompt reply. – SSIND Jun 28 '23 at 23:32
  • The SQL injection hole is introduced whenever you build SQL dynamically via code or script where some of the dynamically built SQL might be from user input. So if your dataframe column contains data that is from user input and they sneak SQL code into a value like `'); DROP TABLE USERS;--` you may be in a world of hurt. The way around this is parameter binding which most libraries support that interact with databases. It's important to understand this attack vector when building code like you are building so you don't accidentally expose your database to malicious actors. – JNevill Jun 29 '23 at 07:30
  • Thank you @JNevill, the approach you mentioned earlier worked perfectly for me. Also thanks for explaining the sql injection hole concept and parameter binding. I understand why it is necessary. I will try to learn how to implement parameter binding. One last question, I just have the read access to the database, will it still be exposed? – SSIND Jun 29 '23 at 15:14
  • Security/Access is definitely one of the ways to help close the sql injection attack hole. I should mention, if this script you are writing is only used by you or other folks that enter their own database credentials into the script, then it's not something you need to worry about. It's when you are writing application where the credentials are embedded in the application itself. – JNevill Jun 29 '23 at 15:57
  • Understood, thank you again. – SSIND Jul 10 '23 at 20:43

1 Answers1

0

First off, if you are using pandas with a database other than SQLite you should be using SQLAlchemy. The third-party SQLAlchemy dialect for Teradata

https://pypi.org/project/teradatasqlalchemy/

appears to be actively maintained. (Last updated 2023-04-05.)

Then, you can use SQLAlchemy to build the query for you.

from datetime import date

import pandas as pd
import sqlalchemy as sa

reference_df = pd.DataFrame([("15800",), ("15801",)], columns=["product_id"])
id_list = reference_df["product_id"].tolist()
print(id_list)  # ['15800', '15801']

connection_url = "teradatasql://…"
engine = sa.create_engine(connection_url)

tbl = sa.Table("table_name", sa.MetaData(), autoload_with=engine)
qry = (
    sa.select(sa.text("*"))
    .select_from(tbl)
    .where(tbl.c.ProductID.in_(id_list))
    .where(tbl.c.ProductDate == date(2023, 4, 30))
    .where(tbl.c.prod_TYPE == "LE")
)
# (tip: multiple .where() clauses are AND-ed together)

engine.echo = True
df = pd.read_sql_query(qry, engine)
"""SQL emitted:
SELECT * 
FROM table_name 
WHERE table_name.[ProductID] IN (?, ?) AND table_name.[ProductDate] = ? AND table_name.[prod_TYPE] = ?
[generated in 0.00066s] ('15800', '15801', datetime.datetime(2023, 4, 30, 0, 0), 'LE')
"""

(This example was tested using MS SQL Server, so the Teradata SQL statement may look slightly different.)

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418