1

I have defined a dictionary which contains several parameters and their values which will ultimately be used to build a SQL Query

query_params = collections.OrderedDict(
        {'table_name':'publilc.churn_data',
         'date_from':'201712',
         'date_to':'201805',
         'class_target':'NPA'
      })

The parameters are to be used in the below query:

sql_data_sample = str("""select * from %s # get value of table_name
                                    where dt = %s    #get value of date_from
                                    and target in ('ACTIVE')

                        ----------------------------------------------------
                        union all
                        ----------------------------------------------------
                        (select * from %s #get value of table_name
                                 where dt = %s #get value of date_to
                                 and target in (%s));""") #get value of class_target
                                    %("'"+.join(str(list(query_params.values())[0])) + "'" + 
                                    "'"+.join(list(query_params.values())[1]) + "'" + 
                                    "'"+.join(list(query_params.values())[2]) + "'" +
                                    "'"+.join(list(query_params.values())[3]) + "'" )

However this gives me an indentation error as below:

get_ipython().run_line_magic('("\'"+.join(list(query_params.values())[0])', '+ "\'"')
    ^
IndentationError: unexpected indent

The query should ultimately look like:

select *from public.churn_data
        where dt = '201712'
        and target in ('ACTIVE')

----------------------------------------------------
union all
----------------------------------------------------
 (select * from public.churn_data 
            where dt = '201805'
            and target in ('NPA'));

I am not being able to figure out where the source of the error is.Is it because of the public. in table_name? Can someone please help me with this??

Shuvayan Das
  • 1,198
  • 3
  • 20
  • 40
  • Please do not use Python's string formatting to build your query strings. Not only does that make your program vulnerable to SQL injection attacks, it also clutters your code with quote characters as you try to enforce the correct representation of your parameters in the SQL. Every DB API 2-compliant package provides bindings that allow you to pass parameters to your query when calling `execute()`, `executemany()` or `callproc()`. Use them for safety and simplicity – shmee Jul 17 '18 at 06:26

2 Answers2

3

Please use a parameterized query as described in the docs

Since you already have a dict, you could do:

sql_data_sample = """select * from %(table_name)s
           where dt = %(date_from)s
           and target in ('ACTIVE')
           ----------------------------------------------------
           union all
           ----------------------------------------------------
           (select * from %(table_name)s
           where dt = %(date_to)s
           and target in (%(class_target)s));"""

cur.execute(sql_data_sample, query_params)

I haven't tested if if works with an odered dict, but I think it should. If not, you could make your ordered dict a regular dict before passing it as parameters mapping.

EDIT Unless you need your parameters to be an OrderedDict later on, use a regular dict. As far as I can see, you only opted for an OrderedDict to preserve the value order for the list(query_params.values())[0].

EDIT2 Table names and field names cannot be passed using bindings. Antoine Dusséaux pointed out in this answer that psycopg2 offers a more or less secure way to do that since version 2.7.

from psycopg2 import sql

sql_data_sample = """select * from {0}
           where dt = %(date_from)s
           and target in ('ACTIVE')
           ----------------------------------------------------
           union all
           ----------------------------------------------------
           (select * from {0}
           where dt = %(date_to)s
           and target in (%(class_target)s));"""

cur.execute(sql.SQL(sql_data_sample)
                .format(sql.Identifier(query_params['table_name'])), 
            query_params)

You might have to remove the table_name from your dict, I am not sure how psycopg2 reacts on additional items in the parameters dict and I cannot test it right now.

It should be pointed out, that this still poses the risk of SQL injection and should be avoided unless absolutely necessary. Normally, table and field names are a rather fixed part of a query string.

Here's the relevant documentation for the sqlmodule.

shmee
  • 4,721
  • 2
  • 18
  • 27
  • Hello @shmee.. Thanks a lot for the comments and probable answer.I am checking it now and will soon update. :) Thanks!! – Shuvayan Das Jul 17 '18 at 06:49
  • Hello @shmee.. This gives an error as below: ProgrammingError: syntax error at or near "'public.churn_data'" LINE 2: The table name has to appear without single quotes whereas in this case it comes as 'public.churn_data' – Shuvayan Das Jul 17 '18 at 07:01
  • @ShuvayanDas, oh. yes table name, hold on a second, I'm gonna edit my answer quickly – shmee Jul 17 '18 at 07:04
  • @ShuvayanDas completed my edit. Please note that I don't have the possibility to test the proposal right now, apologies if I made a typo anywhere. However, please consider making your table name static in the query string. I don't know where you get the input in query_params from, but the table and field names should be something encapsulated in your code and not come from the "outside". – shmee Jul 17 '18 at 07:33
  • Points noted and thanks a lot indeed!! I missed the SQL Injection part completely. – Shuvayan Das Jul 17 '18 at 07:56
  • Hello @shmee.. I have posted another question with a few minor changes here : https://stackoverflow.com/questions/51393957/read-sql-query-returns-an-empty-dataframe-after-i-pass-parameters-as-a-dict-in-p?noredirect=1#comment89760071_51393957.. Can you kindly take a look? – Shuvayan Das Jul 18 '18 at 07:59
-2

You can use following code to remove indentation error

sql_data_sample = str("""
select * from %s
where dt = %s
and target in ('ACTIVE')
----------------------------------------------------
union all
----------------------------------------------------
(select * from %s
where dt = %s
and target in (%s));""" %(
    "'" + str(list(query_params.values())[0]) + "'" +
    "'" + list(query_params.values())[1] + "'" +
    "'" + list(query_params.values())[2] + "'" +
    "'" + list(query_params.values())[3] + "'"
))

But you need to pass one more argument as you used %s 5 times but params are only 4

  • This gives not enough arguments to format string after I implement:sql_data_sample = str(""" select * from %s where dt = %s and target in ('ACTIVE') ---------------------------------------------------- union all ---------------------------------------------------- (select * from %s where dt = %s and target in (%s));""" %( "'" + str(list(query_params.values())[0]) + "'" + "'" + list(query_params.values())[1] + "'" + "'"+ list(query_params.values())[0] + "'" + "'" + list(query_params.values())[2] + "'" + "'" + list(query_params.values())[3] + "'" )) – Shuvayan Das Jul 17 '18 at 05:58
  • That is what I mentioned, you need to give one argument because you used %s 5 time and you are passing only 4 arguments – nandudeore23 Jul 17 '18 at 06:12