15

Consider that I have a list of customer names:

lst = ['John','Jack','Martin']

The desired output in a variable: 'John','Jack','Martin'

I can use something like ",".join(lst) and then more string formatting to achieve this. Is there any clear and more direct approach to this?

My idea is to pass the elements of a string in a SQL where clause.

maxshuty
  • 9,708
  • 13
  • 64
  • 77
Maverick
  • 159
  • 1
  • 1
  • 10
  • whatever python's equivalent of `"'" . implode("','", $array) . "'"` would be... but note that this leaves you open to sql injection attacks. – Marc B Sep 02 '15 at 14:20

6 Answers6

30

Do not use this for SQL query generation. Use the database driver SQL parameters instead. You cannot hope to properly escape your way out of SQL injection attacks otherwise.

If you need to use a WHERE .. IN .. test, generate placeholders:

query = 'SELECT * FROM table WHERE column IN ({})'.format(','.join(['%s'] * len(lst)))
cursor.execute(query, lst)

For everything else, use a list comprehension to add the quotes to the values, then join the results with commas:

', '.join(['"{}"'.format(value) for value in lst])

Demo:

>>> lst = ['John','Jack','Martin']
>>> ', '.join(['"{}"'.format(value) for value in lst])
'"John", "Jack", "Martin"'
>>> print ', '.join(['"{}"'.format(value) for value in lst])
"John", "Jack", "Martin"

This will consistently use " double quotes; simply use "'{}'" as the template if you must have single quotes instead.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    Why the list comprehension? I've seen people argue that it's faster (which might be true), but that particular optimization is very specific to `str.join` and the speedups are almost certainly negligible for most applications... Unless I can prove there's a difference, I'd rather use a generate here for consistency with other functions where you pass genenerators (e.g. `sum`) – mgilson Sep 02 '15 at 14:29
  • @mgilson: it is faster, because the generator expression is *also* turned into a list *anyway*. The `str.join()` call has to iterate over the strings twice. See Raymond Hettinger's answer: [list comprehension without \[ \], Python](http://stackoverflow.com/a/9061024) – Martijn Pieters Sep 02 '15 at 14:32
  • Sure -- I wasn't arguing that it isn't faster. The numbers that Raymond quoted put the speedup somewhere around 10-15% which will likely not have any measurable effect on the total runtime of the program unless this is in a really tight loop. I suppose my question is whether you think the speedup is worth it compared to needing to remember a special case for `str.join` -- My gut says to avoid special casing, but I'm only one guy and my opinion might not matter for much :-). – mgilson Sep 02 '15 at 14:38
  • 1
    @mgilson I'm so used to remembering this now the cognitive cost is close to nill. *Not* using a list comp would require more effort at this point. I no longer see this as a special case. But that's just me. – Martijn Pieters Sep 02 '15 at 14:42
  • FWIW, I just repeated Raymond's experiment with pypy and the discrepancy is even worse (~ a factor of 2 difference). Consequently though, the slower (generator) pypy version was about 3 times faster than the faster python version (list-comp). It does make me wonder if this special case is worth adopting. I'd love to get a community opinion on that one, but alas, a question like that would probably be closed within 2 minutes here (especially if I was the one to post it ;-). – mgilson Sep 02 '15 at 15:00
4

If you are passing this to SQL and it is user-supplied data, you should be looking at a module that will handle proper quoting / parameterization for you, e.g. psycopg2 for Postgres. I am sure there is an equivalent for MySQL.

Using simple techniques like str.join can leave you open to SQL injection attacks unless you are extremely careful about checking the result before handing to SQL.

An example of how to do this in psycopg2:

http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries

As a side effect this also handles types for you, i.e. it knows to pass integers as numbers, strings as quoted strings, etc.

Dan Lowe
  • 51,713
  • 20
  • 123
  • 112
4

You could use repr to make each string valid Python input:

lst = ['John','Jack','Martin']
output = ','.join(map(repr,lst))
Toby Speight
  • 27,591
  • 48
  • 66
  • 103
DaveQ
  • 1,642
  • 10
  • 15
  • Make sure that the unit tests cover all the plausible SQL injection cases - you probably want a SQL-specific function in place of `repr`, or (more likely) to use parametrized queries rather than text substitution. – Toby Speight Sep 02 '15 at 16:48
2

To get your desired output you can still use join as follows:

lst = ['John','Jack','Martin']
print "'"+"','".join(lst)+"'"

This would display:

'John','Jack','Martin'

Alternatively you could use format() to avoid string concatenation:

print "'{}'".format("','".join(lst))

Please though take note of the other's comments regarding its use with SQL.

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
1

Step 1. Actually change each element of list, so that it with quotes, (in case it's not an string, convert it to string.)

f = lambda x: "'"+str(x)+"'"
lst = map(f,lst)

Step 2. Join each element with a comma ,

','.join(lst)

Output from Python Shell

>>> f = lambda x: "'"+str(x)+"'"
>>> lst = [1,2,3,4,5]
>>> lst = map(f,lst)
>>> lst
["'1'", "'2'", "'3'", "'4'", "'5'"]
>>> ','.join(lst)
"'1','2','3','4','5'"
Ashwani Agarwal
  • 1,279
  • 9
  • 30
0

If you're using an odbc compliant database. Taken from the pyodbc docs:

cursor.execute(
    """
    select user_id, user_name
      from users
     where last_logon < ?
           and bill_overdue = ?
    """, '2001-01-01', 1)

To use the list:

cursor.execute(
    """
    select user_id, user_name
      from users
     where last_logon < ?
           and bill_overdue = ?
    """, lst)
oxdeadbeef
  • 160
  • 9