0

I try to use UPDATE statement in pymysql to update some column's value where self.key, self.value, self.id are three variables.

cur.execute("UPDATE environment_history SET {key}=%s WHERE id=%s".format(key=self.key),
                    (self.value, self.id)
                    )

While in the above code, this leaves an opportunity for SQL Injection. We'll just have to post some data like this

{"id":"23151","key":"uuid='ac211';#","value":"abcde"}

This would update all rows, and this could be more dangerous. I have to stop this from happening.

I've tried some unpractical solutions, one of them is:

cur.execute("UPDATE environment_history SET %s=%s WHERE id=%s",
                    (self.key,self.value, self.id)
                    )

However, pymysql will escape column name to something like\'uuid\'.

And this is an SQL Syntax error since SET col_name=3 and SET `col_name`=3 are correct.

Then, how to avoid this kind of SQL Injection?

Trying to replace and escape self.key manually was the only option I can ever come up with, but that doesn't sounds like an elegant way.

My database is MySQL.

Any ideas? Thanks!

  • Column names can't be parameters in your prepared statement. Even if the above could be made to work, you should not do it. Instead, find a way so that you can hard code the column and table names. – Tim Biegeleisen May 06 '19 at 10:23
  • Are you getting `self.key` directly from input? That by itself is pretty dangerous. Why not filtering it by specific keywords? – AdamGold May 06 '19 at 10:24
  • 1
    I voted to close as duplicate of "How to prepare SQL query dynamically (column names too) avoiding SQL injection" even though that question uses PHP for example code. But the principle is the same in any language. Restrict your `self.key` to a whitelist of values of known column names, and then interpolate that into your SQL string with plain string interpolation, not with `cur.execute()` magic that quotes values. – Bill Karwin May 06 '19 at 15:50

3 Answers3

1

I would suggest filtering self.key down to a known value, only then use it as a column. For example:

keywords = ["uuid", "other", "keywords"]
if self.key in keywords:
    column_name = self.key
else:
    column_name = keywords[0]
AdamGold
  • 4,941
  • 4
  • 29
  • 47
1

From my experience with dynamic queries like that, I'd suggest you to keep a table/key definitions somewhere and check if those are valid.

Basic idea would be to keep a dict of valid keys for tables, for example:

valid_keys = {
   'table1': ['key1', 'key2', 'key3'],
   'table2': ['key1', 'key2'],
   ...
}

and doing a simple check, for example:

keys = valid_keys[table] if table in valid_keys.keys() else None
if keys != None and key in keys:
   # do your query stuff

You can place that check in a method/function, and make it even simpler, without need to repeat yourself:

# checks table and key validity
def query_data_valid(table, key):
   keys = valid_keys[table] if table in valid_keys.keys() else None
   if keys != None and key in keys:
      return True

   return False

...
# block in your query calling method
if query_data_valid(table, key):
   # do your query stuff
icwebndev
  • 413
  • 3
  • 10
-1

This doesn't answer your question, but a general rule. As explained in this post you need to use parameterized queries.

A parameterized query is a query in which placeholders used for parameters and the parameter values supplied at execution time. That means parameterized query gets compiled only once.

Incorrect (with security issues)

c.execute("SELECT * FROM foo WHERE bar = %s AND baz = %s" % (param1, param2))

Correct (with escaping)

c.execute("SELECT * FROM foo WHERE bar = {0} AND baz = {1}".format(param1, param2))
prosti
  • 42,291
  • 14
  • 186
  • 151