23

I am writing a class to be used as part of a much larger modeling algorithm. My part does spatial analysis to calculate distances from certain points to other points. There are a variety of conditions involving number of returned distances, cutoff distances, and etc.

Currently, the project specification only indicates hardcoded situations. i.e. "Function #1 needs to list all the distances from point set A to point set B within 500m. Function #2 needs to list all the distances from point set C to point set D..." and so on.

I don't want to hardcode these parameters, and neither does the person who is developing the next stage of the model, because obviously they would like to tweak the parameters or possibly re-use the algorithm in other projects where they will have different conditions.

Now the problem is that I am using psycopg2 to do this. This is the standard where I work so I do not have a choice of deviating from it. I have read that it is a very bad idea to expose parameters that will be put into the executed queries as parameters due to the obvious reason of SQL injection. However, I thought that psycopg2 automatically sanitized SQL input. I think that the issue is using the AsIs function.

The easy solution is just to hardcode it as specified in the project but this feels lazy and sloppy to me. I don't like doing lazy and sloppy work.

Is it at all safe to allow the user to input parameters that will be input into a psycopg2-executed query? Or is it just using AsIs that makes it unsafe? If I wanted to allow the user to be able to input these parameters, do I have to take the responsibility upon myself to santitize the inputs, and if so, is there a quick and easy way to do it, like with another python library or something?

wfgeo
  • 2,716
  • 4
  • 30
  • 51

3 Answers3

47

AsIs is unsafe, unless you really know what you are doing. You can use it for unit testing for example.

Passing parameters is not that unsafe, as long as you do not pre-format your sql query. Never do:

sql_query = 'SELECT * FROM {}'.format(user_input)
cur.execute(sql_query)

Since user_input could be ';DROP DATABASE;' for instance.

Instead, do:

sql_query = 'SELECT * FROM %s'
cur.execute(sql_query, (user_input,))

pyscopg2 will sanitize your query. Also, you can pre-sanitize the parameters in your code with your own logic, if you really do not trust your user's input.

Per psycopg2's documentation:

Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.

Also, I would never, ever, let my users tell me which table I should query. Your app's logic (or routes) should tell you that.

Regarding AsIs(), per psycopg2's documentation :

Asis()... for objects whose string representation is already valid as SQL representation.

So, don't use it with user's input.

Diego Mora Cespedes
  • 3,605
  • 5
  • 26
  • 33
  • So the main takeaway is: Do not use `AsIs()`. Are there any restrictions for things like setting table names? I seem to recall reading somewhere that you can input values via the `execute()` command, but you shouldn't use it in table names. Does this sound right or am I completely off? And yes, I never use string concatenation, nor string parameters/formatting. But I was a little confused because I was starting to think that even using the `execute()` command was also unsafe, which was starting make me wonder what the whole point of the library was – wfgeo Jul 16 '17 at 13:14
  • So then what is the difference between `AsIs(input)` and just `input`? does using `AsIs()` bypass the sanitation? – wfgeo Jul 16 '17 at 13:20
  • Yes, I think `AsIs()` bypasses the sanitation. Don't use it with user's input. Personally I only use it for unit testing. https://stackoverflow.com/questions/13793399/passing-table-name-as-a-parameter-in-psycopg2 – Diego Mora Cespedes Jul 16 '17 at 13:22
  • @1saac I just edited my answer to point this: http://initd.org/psycopg/docs/extensions.html#psycopg2.extensions.AsIs – Diego Mora Cespedes Jul 16 '17 at 13:28
  • Ok thank you. So it is not unsafe to use user input in the SQL query, so long as you are not using `AsIs()`, and you are using the `cur.execute()` command, and not using string concatenation/parameters/formatting – wfgeo Jul 16 '17 at 13:42
  • Yes, and don't let your user tell you which table to query. – Diego Mora Cespedes Jul 16 '17 at 14:00
  • Well I'm looking deeper now and it seems that psycopg2 now uses something called `psycopg2.sql.Identifier()`, which can be used to pass table names as parameters. Do you know anything about this? Because it seems like there is very little written on it but it seems to allow the passing of table names as parameters. I need to be able to let the user to pick which tables to operate on otherwise it defeats the whole purpose. – wfgeo Jul 16 '17 at 14:10
7

You can use psycopg2.sql to compose dynamic queries. Unlike AsIs it will protect you from SQL injection.

piro
  • 13,378
  • 5
  • 34
  • 38
3

If you need to store your query in a variable you can use the SQL method (documentation) :

from psycopg2 import sql


query = sql.SQL("SELECT * FROM Client where id={clientId}").format(clientId=sql.Literal(clientId)
Dan
  • 1,159
  • 13
  • 8
  • 1
    How can similar be achieved if the variable name is not known (the `{clientId}` in example)? My team want me to write a universal function that could connect and exec query with parameters in one function, so I combined `psycopg2.connect` `cursor` and `execute` in one function. However, the parameter needs to be sanitised. The problem is, the number of parameters and the name of parameters is unknown. – jimmymcheung Dec 22 '22 at 13:29
  • I'm wondering the same as @jimmymcheung... – schiavuzzi Jan 31 '23 at 23:29
  • Follow up to this: is it possible to get the sanitized query string out as an actual string or does that defeat the purpose? In my case the only way I can execute an UPDATE query is through a text file containing the update - so I need a string. This `sql.SQL` still stores the composable query and parameters separately. However, if I want to get it out as a string, I need to do `as_string(connection)` which *requires access to a connection, which I do not have*. Is it possible to get this out as a "sanitized" query string, or would that defeat the purpose of what `psycopg2` is trying to do? – Marses Jul 03 '23 at 13:39