2

I'm using Pyodbc in Python to run some SQL queries. What I'm working with is actually longer than this, but this example captures what I'm trying to do:

connection = pyodbc.connect(...)
cursor = connection.cursor(...)

dte = '2018-10-24'

#note the placeholders '{}'
query = """select invoice_id
    into #output 
    from table1 with (nolock) 
    where system_id = 'PrimaryColor' 
    and posting_date = '{}' 

    insert into #output
    select invoice_id
    from table2 with (nolock)
    where system_id = 'PrimaryColor'
    and posting_date = '{}'"""

#this is where I need help as explained below
cursor.execute(query.format(dte, dte))

output = pd.read_sql("""select *
                 from #output"""
                 , connection)

In the above, since there are only two '{}', I'm passing dte to query.format() twice. However, in the more complicated version I'm working with, I have 19 '{}', so I'd imagine this means I need to pass 'dte' to 'query.format{}' 19 times. I tried passing this as a list, but it didn't work. Do I really need to write out the variable 19 times when passing it to the function?

petezurich
  • 9,280
  • 9
  • 43
  • 57
erik7970
  • 693
  • 1
  • 8
  • 21

2 Answers2

2

Consider using a UNION ALL query to avoid the temp table needs and parameterization where you set qmark placeholders and in a subsequent step bind values to them. And being the same value multiply the parameter list/tuple by needed number:

dte = '2018-10-24'

# NOTE THE QMARK PLACEHOLDERS
query = """select invoice_id    
           from table1 with (nolock) 
           where system_id = 'PrimaryColor' 
             and posting_date = ? 

           union all

           select invoice_id
           from table2 with (nolock)
           where system_id = 'PrimaryColor'
             and posting_date = ?"""

output = pd.read_sql(query, connection, params=(dte,)*2)
Parfait
  • 104,375
  • 17
  • 94
  • 125
1

I agree with the comments, pandas.read_sql has a params argument which prevent from sql injection.

See this post to understand how to use it depending on the database.

Pyodbc has the same parameter on the execute method.

# standard 
cursor.execute("select a from tbl where b=? and c=?", (x, y))

# pyodbc extension 
cursor.execute("select a from tbl where b=? and c=?", x, y)

To answer to the initial question, even if it is bad practice for building SQL queries :

Do I really need to write out the variable 19 times when passing it to the function?

Of course you don't :

query = """select invoice_id
into #output 
from table1 with (nolock) 
where system_id = 'PrimaryColor' 
and posting_date = '{dte}' 

insert into #output
select invoice_id
from table2 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{dte}'""".format(**{'dte': dte})

or :

query = """select invoice_id
into #output 
from table1 with (nolock) 
where system_id = 'PrimaryColor' 
and posting_date = '{0}' 

insert into #output
select invoice_id
from table2 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{0}'""".format(dte)

Python 3.6+ :

query = f"""select invoice_id
into #output 
from table1 with (nolock) 
where system_id = 'PrimaryColor' 
and posting_date = '{dte}' 

insert into #output
select invoice_id
from table2 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{dte}'"""

Note the usage of f before """ ... """

Corentin Limier
  • 4,946
  • 1
  • 13
  • 24