4

I have some data stored in an array, among these there is a message field that can contain different kind of characters.. Among them the quotes characters.

    import MySQLdb
    for key, value in result_array.items(): 
            insert_array.append("`%s` = \"%s\"" %(key, value))
    insert_values = " , ".join(insert_array)
    query_insert = "INSERT into `%s` SET %s ON DUPLICATE KEY UPDATE %s" %(insert_table, insert_values, insert_values)
    cursor.execute(query_insert)

This code fails if the message contains a double quote character. How to escape it?

P.S.

I want to focus on the difference between connection.escape_string and cursor.execute(operation, params=None, multi=True) with params. There was an asnwer (now deleted) that was suggesting to use connection.escape_string and as some people who replied said that is not safe. This is because escape query manually could be dangerous and it is better use parameterise function. [someone might be better than me in explaining this concept, please welcome to edit this part]

salvob
  • 1,300
  • 3
  • 21
  • 41
  • Which is the python module that you are using to connect to MySQL? – Walter_Ritzel May 09 '16 at 16:16
  • `MySQLdb` updating my question – salvob May 09 '16 at 16:17
  • Why the *heck* are you using a formatted string for SQL? You should almost *never* do this. It invites SQL injection. – jpmc26 May 09 '16 at 18:14
  • @jpmc26 well, in this particular case is not a web app or some other things that users can use. It's a script that run in batch. But, please, enlight me, what should I use it? – salvob May 09 '16 at 22:42
  • @salvob Parameterized queries. Every language I know of has some form of them. glglgl's answer is using one (second parameter to `execute`). These escape your input by default to prevent SQL injection. It's harder to do this when you need to vary the actual table or column names, but you should think carefully about if there's some way to avoid doing that and just hard code the names instead. At a bare minimum, I would provide some kind of whitelisting on the names coming into my function. If for no other reason, then because I don't want some other dev to grab my code thinking it's safe. – jpmc26 May 09 '16 at 22:45
  • as I told you this is a batch script, that run for different tables. So it's better in this way, but Thank you for your advices. The chosen answer did suggest to use parameterized query, though. – salvob May 09 '16 at 22:49

1 Answers1

4

By using the execute() call properly:

data = []
for key, value in result_array.items():
    # only proper if key is NOT user defined!
    insert_array.append("%s = %%s" % key) # results in key = %s
    data.append(value)
insert_values = " , ".join(insert_array)
query_insert = "INSERT into `%s` SET %s ON DUPLICATE KEY UPDATE %s" %(insert_table, insert_values, insert_values)
cursor.execute(query_insert, data * 2)

Didn't test it, and hopfilly understood your code right.

The point is to pass the real data to the execute() call which is supposed to escape it for you. It does so by means of the database functions and thus does it better than you ever could.

glglgl
  • 89,107
  • 13
  • 149
  • 217
  • Thank you for your reply. but I have a few question: The query_insert has three `%s`, while in your code of `execute` you put as second parameter of the function, `data *2` . How can Python figured out where to substitute those paraameters? Did you forget the `value` variable on the loop? or it was on purpose? if so, why? – salvob May 09 '16 at 16:34
  • This does look like the right way to do it, but it could use some explanation. There's a lot going on in there. – tadman May 09 '16 at 16:43
  • Aww, I see what you did, you put in `insert_array` a nested/smart interpolation, then you put in the array `data` just the values, in this way `execute` can understand which is the data needed. great! thank you – salvob May 09 '16 at 16:44
  • The query_insert gets its `%s` replaced by `insert_table` and twice `insert_values`, as in your case. At the end, it looks like `INSERT INTO table SET foo=%s, bar=%s ON DUPLICATE KEY UPDATE foo=%s, bar=%s`. `data` is `[foo_value, bar_value]`, so the `execute()` call gets `[foo_value, bar_value, foo_value, bar_value]` which perfectly fits the four `%s`s in the query. – glglgl May 09 '16 at 16:45
  • A solution based on [escape_string()](http://stackoverflow.com/a/3617097/2050) is way less painful, especially with even more involved queries such as bulk inserts. – Eric Platon May 19 '17 at 06:02