0

A rather simple question but for which we surprisingly didn't found a solution.

Here is my current code, for executing a simple SQL query on a PostgreSQL database from Python 3.6.9 using psycopg2 ('2.9.1 (dt dec pq3 ext lo64)'):

import psycopg2

myid = 100
fields = ('p.id', 'p.name', 'p.type', 'p.price', 'p.warehouse', 'p.location', )
sql_query = ("SELECT " + ', '.join(fields) + " FROM product p "
              "INNER JOIN owner o ON p.id = o.product_id "
              "WHERE p.id = {} AND (o.dateof_purchase IS NOT NULL "
              "OR o.state = 'checked_out' );"
        ).format(myid)

try:
    with psycopg2.connect(**DB_PARAMS) as conn:
        with conn.cursor(cursor_factory=DictCursor) as curs:
            curs.execute(sql_query, )
            row = curs.fetchone()

except psycopg2.Error as error:
    raise ValueError(f"ERR: something went wrong with the query :\n{sql_query}") from None

We're more and more thinking that this is... not good. (awfully bad to be honest).

Therefore, we're trying to use a modern f-string notation:

sql_query = (f"""SELECT {fields} FROM product p
             INNER JOIN owner o ON p.id = o.product_id
             WHERE p.id = {myid} AND (o.dateof_purchase IS NOT NULL
             OR o.state = 'checked_out' );""")

But then, the query looks like:

SELECT  ('p.id', 'p.name', 'p.type', 'p.price', 'p.warehouse', 'p.location', ) FROM ...;

which is not valid in PSQL because 1. of the brackets, and 2. of the single quoted column names.

We'd like to figure out a way to get rid of these.

In between, we went back to the doc and remembered this:

never do that!
https://www.psycopg.org/docs/usage.html

Ooops! So we refactored it this way:

sql_query = (f"""SELECT %s FROM product p
             INNER JOIN owner o ON p.id = o.product_id
             WHERE p.id = %s AND (o.dateof_purchase IS NOT NULL
             OR o.state = 'checked_out' );""")  

try:
    with psycopg2.connect(**DB_PARAMS) as conn:
        with conn.cursor(cursor_factory=DictCursor) as curs:
            # passing a tuple as it only accept one more argument after the query!
            curs.execute(sql_query, (fields, myid))
            row = curs.fetchone()

and mogrify() says:

"SELECT ('p.id', 'p.name', 'p.type', 'p.price', 'p.warehouse', 'p.location', ) FROM ...;"

here again, the brackets and the single quotes are causing troubles, but no error is actually raised.
The only thing is that row evaluates to this strange result:

['('p.id', 'p.name', 'p.type', 'p.price', 'p.warehouse', 'p.location', )']

So, how could we cleverly and dynamically build a psycopg2 query using a list of parameters for column names without neglecting the security?

(A trick could be to fetch all columns and filter them out after... but there are too many columns, some with quiet large amount of data that we don't need, that's why we want to run a query using a precisely defined selection of columns, which may get dynamically extended by some function, otherwise we would have hard-coded these column names of course).

OS: Ubuntu 18.04
PostgreSQL: 13.3 (Debian 13.3-1.pgdg100+1)

swiss_knight
  • 5,787
  • 8
  • 50
  • 92
  • 1
    Use a full-fledged SQL interaction library like SQLAlchemy, which permits you to dynamically build any clause of the query, including the SELECT list. Or, go back to your original string manipulation. The not-even-at-gunpoint warning is about *variables* - typically, user-provided strings that you're testing against row/cell data. Actual table names, column names, etc - things that are defined by you, in your code, and not the user - can't be passed as parameters, you need to build that string yourself. Whether you use f-strings, concatenation, or whatever else is orthogonal. – AdamKG Jul 16 '21 at 20:35
  • 2
    @AdamKG that is wrong in the `psycopg2` case. See [sql](https://www.psycopg.org/docs/sql.html). That is a better option the SQLAlchemy if you don't want all it's baggage. – Adrian Klaver Jul 16 '21 at 21:19
  • Thanks for the pointer! I guess I'm out of date on psycopg - that looks like a very nice & useful addition. – AdamKG Jul 16 '21 at 21:54

2 Answers2

1

The '%s' insertion will try to turn every argument into an SQL string, as @AdamKG pointed out. Instead, you can use the psycopg2.sql module will allow you to insert identifiers into queries, not just strings:

from psycopg2 import sql

fields = ('id', 'name', 'type', 'price', 'warehouse', 'location', )

sql_query = sql.SQL(
          """SELECT {} FROM product p
             INNER JOIN owner o ON p.id = o.product_id
             WHERE p.id = %s AND (o.dateof_purchase IS NOT NULL
             OR o.state = 'checked_out' );""")

try:
    with psycopg2.connect(**DB_PARAMS) as conn:
        with conn.cursor(cursor_factory=DictCursor) as curs:
            # passing a tuple as it only accept one more argument after the query!
            curs.execute(sql_query.format(*[sql.Identifier(field) for field in fields]), (*fields, myid))
            row = curs.fetchone()
Phin Jensen
  • 411
  • 3
  • 13
  • -> `AttributeError: 'list' object has no attribute 'join'` I was more on something like: `{len(fields)*'%s '}` but I still have trouble with the single quotes. – swiss_knight Jul 16 '21 at 20:34
  • `', '.join(['%s'] * len(fields))` will fix the AttributeError, but this isn't what you're after anyway. This will let you select the *strings* `'p.id'`, `'p.name'`, etc. To get those *columns*, you need to have them be in the string that's passed as the first argument to execute, not as query parameters. – AdamKG Jul 16 '21 at 20:37
  • @AdamKG Ah, you're right. My bad. I updated the answer to use psycopg2.sql instead, which should work. – Phin Jensen Jul 16 '21 at 20:46
  • Actually, `{}` <- `SyntaxError: f-string: empty expression not allowed` – swiss_knight Jul 16 '21 at 20:59
  • Sorry, it shouldn't be an f-string anymore. Just remove the f from the beginning of the string. I'll update it reflect that. – Phin Jensen Jul 16 '21 at 21:17
  • Got a `TypeError: not all arguments converted during string formatting` with the current state of this code. – swiss_knight Sep 19 '21 at 09:39
1

I finally found a solution. It makes use of map to use a list or a tuple of column names and sql.Literal to use a given id, this is maybe cleaner:

conn = psycopg2.connect(**DB_PARAMS)

myid = 100
# using the simple column identifiers
fields_1 = ('id', 'name', 'type', 'price', 'warehouse', 'location',)
# using the dot notation with the table alias 'p' as the prefix:
fields_2 = ('p.id', 'p.name', 'p.type', 'p.price', 'p.warehouse', 'p.location',)

sql_query_1 = sql.SQL("""
    SELECT {f} FROM product p
    INNER JOIN owner o ON p.id = o.product_id
    WHERE p.id = {j} AND (o.dateof_purchase IS NOT NULL
    OR o.state = 'checked_out' );"""
).format(
    f = sql.SQL(',').join(map(sql.Identifier, fields_1)),
    j = sql.Literal(myid)
)

sql_query_2 = sql.SQL("""
    SELECT {f} FROM product p
    INNER JOIN owner o ON p.id = o.product_id
    WHERE p.id = {j} AND (o.dateof_purchase IS NOT NULL
    OR o.state = 'checked_out' );"""
).format(
    f = sql.SQL(',').join(map(sql.SQL, fields_2)), # use sql.SQL!
    j = sql.Literal(myid)
)

sql_query_2b = sql.SQL("""
    SELECT {f} FROM product p
    INNER JOIN owner o ON p.id = o.product_id
    WHERE p.id = {j} AND (o.dateof_purchase IS NOT NULL
    OR o.state = 'checked_out' );"""
).format(
    f = sql.SQL(',').join(map(sql.Identifier, fields_2)), # DON'T use sql.Identifier!
    j = sql.Literal(myid)
)

# VALID SQL QUERY:
print(sql_query_1.as_string(conn))
# will print:
# SELECT "id","name","type","price","warehouse","location" FROM product p
#    INNER JOIN owner o ON p.id = o.product_id
#    WHERE p.id = 100 AND (o.dateof_purchase IS NOT NULL
#    OR o.state = 'checked_out' );


# VALID SQL QUERY:
print(sql_query_2.as_string(conn))
# will print:
# SELECT p.id,p.name,p.type,p.price,p.warehouse,p.location FROM product p
#    INNER JOIN owner o ON p.id = o.product_id
#    WHERE p.id = 100 AND (o.dateof_purchase IS NOT NULL
#    OR o.state = 'checked_out' );

# /!\ INVALID SQL QUERY /!\:
print(sql_query_2b.as_string(conn))
# will print:
# SELECT "p.id","p.name","p.type","p.price","p.warehouse","p.location" FROM product p
#    INNER JOIN owner o ON p.id = o.product_id
#    WHERE p.id = 100 AND (o.dateof_purchase IS NOT NULL
#    OR o.state = 'checked_out' );

But because of that:

sql.Identifier

  • simple columns names are evaluated correctly when in double quotes, eg. id is equivalent to "id", name is equivalent to "name" for PostgreSQL,
  • column name, when prefixed with the dot notation using the table alias or identifier, e.g. p.id or product.id instead of just id or "id" will miserably fail with the following error:
UndefinedColumn: column "p.id" does not exist
LINE 1: SELECT "p.id","p.type","p.price","p.warehouse","p.location",...
               ^
HINT:  Perhaps you meant to reference the column "p.id".
swiss_knight
  • 5,787
  • 8
  • 50
  • 92