4

I am using the sql package in a Jupyter notebook and I understand how to use variables in my query:

client = "Disney"
queryid = %sql SELECT * FROM mytable WHERE name = :client

What I don't understand is how to pass a list to my query, like:

clients = ["Disney", "Netflix", "Sky"]
queryid = %sql SELECT * FROM mytable WHERE name in (:clients)

This raises an error which states that my SQL is wrong. What is the way to handle lists in this setting?

user299791
  • 2,021
  • 3
  • 31
  • 57

4 Answers4

3

With a demo case for sqlite3:

In [1]: import sqlite3
In [2]: conn = sqlite3.connect('example.db')
In [3]: c = conn.cursor()
In [4]: c.execute('''CREATE TABLE stocks
   ...:              (date text, trans text, symbol text, qty real, price real)''')
   ...: 
   ...: # Insert a row of data
   ...: c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.1
   ...: 4)")
   ...: 
   ...: # Save (commit) the changes
   ...: conn.commit()
   ...: 
In [5]: # Larger example that inserts many records at a time
   ...: purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
   ...:              ('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
   ...:              ('2006-04-06', 'SELL', 'IBM', 500, 53.00),
   ...:             ]
   ...: c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)

I can fetch values that match several strings with:

In [31]: c.execute('SELECT * FROM stocks WHERE symbol IN (?,?)',('IBM','RHAT'))
Out[31]: <sqlite3.Cursor at 0xaf703fa0>
In [32]: c.fetchall()
Out[32]: 
[('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14),
 ('2006-03-28', 'BUY', 'IBM', 1000.0, 45.0),
 ('2006-04-06', 'SELL', 'IBM', 500.0, 53.0)]

Or with the generalized solution from Parameter substitution for a SQLite "IN" clause

In [33]: alist=['RHAT','IBM']
In [34]: c.execute('SELECT * FROM stocks WHERE symbol IN (%s)' %
    ...:                            ','.join('?'*len(alist)), 
    ...:                            alist)
    ...:                            
Out[34]: <sqlite3.Cursor at 0xaf703fa0>
In [35]: c.fetchall()
Out[35]: 
[('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14),
 ('2006-03-28', 'BUY', 'IBM', 1000.0, 45.0),
 ('2006-04-06', 'SELL', 'IBM', 500.0, 53.0)]

c.execute('SELECT * FROM stocks WHERE symbol IN (:1,:2)',alist), and possibly other forms.

see also:

sqlite3 "IN" clause

I assume MYSQL and %sql iterface behaves the same; but I don't have those installed.


With proper quoting literals also work (again sqlite3)

c.execute('SELECT * FROM stocks WHERE symbol IN ("IBM","RHAT")')

or

In [80]: 'SELECT * FROM stocks WHERE symbol IN (%s)'%','.join('"%s"'%x for x in alist)
Out[80]: 'SELECT * FROM stocks WHERE symbol IN ("RHAT","IBM")'
In [81]: c.execute(_)

So I'm guessing that:

%sql SELECT * FROM stocks WHERE symbol IN ("IBM","RHAT")

would work even if some form of (:....) does not.


I installed %sql

In [5]: %%sql
   ...: sqlite:///example.db
   ...: 

Out[5]: 'Connected: None@example.db'
In [7]: %sql SELECT * from stocks
Done.
Out[7]: 
[('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14),
 ('2006-03-28', 'BUY', 'IBM', 1000.0, 45.0),
 ('2006-04-05', 'BUY', 'MSFT', 1000.0, 72.0),
 ('2006-04-06', 'SELL', 'IBM', 500.0, 53.0)]

In [9]: %sql SELECT * from stocks where symbol in ('IBM')
Done.
Out[9]: 
[('2006-03-28', 'BUY', 'IBM', 1000.0, 45.0),
 ('2006-04-06', 'SELL', 'IBM', 500.0, 53.0)]

In [10]: %sql SELECT * from stocks where symbol in ('IBM','RHAT')
Done.
Out[10]: 
[('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14),
 ('2006-03-28', 'BUY', 'IBM', 1000.0, 45.0),
 ('2006-04-06', 'SELL', 'IBM', 500.0, 53.0)]

The string formatting approach works:

In [11]: alist=['RHAT','IBM']
In [12]: cmd='SELECT * FROM stocks WHERE symbol IN (%s)'%','.join('"%s"'%x for x
    ...:  in alist)
In [13]: cmd
Out[13]: 'SELECT * FROM stocks WHERE symbol IN ("RHAT","IBM")'
In [14]: %sql $cmd
Done.
Out[14]: 
[('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14),
 ('2006-03-28', 'BUY', 'IBM', 1000.0, 45.0),
 ('2006-04-06', 'SELL', 'IBM', 500.0, 53.0)]

The : syntax isn't well documented. It isn't clear who's implementing it. ($ is standard Ipython variable substitution).

In [18]: sym='IBM'
In [19]: %sql SELECT * from stocks where symbol in (:sym)
Done.
Out[19]: 
[('2006-03-28', 'BUY', 'IBM', 1000.0, 45.0),
 ('2006-04-06', 'SELL', 'IBM', 500.0, 53.0)]

symbol in (:sym1,:sym2) works

So far I don't see evidence that %sql works with the conventional SQL placeholder syntax.


Looks like you (?) submitted and closed a issue on github, https://github.com/catherinedevlin/ipython-sql/issues/92

Adapting that solution to quote strings:

In [74]: mystring = '({})'.format(','.join('"{}"'.format(e) for e in alist))
In [75]: mystring
Out[75]: '("RHAT","IBM")'
In [76]: %sql SELECT * from stocks where symbol in $mystring
Done.

In other words, use the ipython $ injection as opposed to the : form.


Looking at the ipython-sql source code:

ipython-sql/blob/master/src/sql/run.py
def run(conn, sql, config, user_namespace):
    ...
    txt = sqlalchemy.sql.text(statement)
    result = conn.session.execute(txt, user_namespace)

It looks like the :name syntax is a sqlalchemy bind parameter, and is handled with sqlalchemy.sql.text and sqlalchemy.sql.bindparam

(http://docs.sqlalchemy.org/en/latest/orm/tutorial.html#orm-tutorial-literal-sql)

This error indicates that each bindparameter is translated into a ? placeholder, plus as matching parameters entry:

In [96]: %sql SELECT * from stocks where symbol in :mystring
(sqlite3.OperationalError) near "?": syntax error [SQL: 'SELECT * from stocks where symbol in ?'] [parameters: ('("RHAT","IBM")',)]

So my original solution of generating IN (?,?,...) to match the length of the list is the right SQL, even though it does not work with sqlalchemy and %sql.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • thanks for your time, but as I said before, either you have a solution for %sql or you don't... – user299791 Jul 18 '17 at 08:47
  • 1
    I've downloaded `%sql` and got a solution working with the `Ipython` `$` syntax. Conventional SQL placeholder syntax apparently does not work with `%sql` and `sqlalchemy` which has its own `bindparameters` syntax. – hpaulj Jul 19 '17 at 04:38
  • This is a fantastically thorough answer that really anticipates future visits to this question. The %sql version worked like a charm for me. – pyrocarm Nov 26 '19 at 21:51
1

Removing parentheses works for me.

clients = ["Disney", "Netflix", "Sky"]
queryid = %sql SELECT * FROM mytable WHERE name in :clients
cakraww
  • 2,493
  • 28
  • 30
0

This function saves me from this kind of problem:


> def splitParam(param): # this function splits multiple values of filter
    filteredParam = ""
    if param:
        for item in param:
            if filteredParam=="":
                filteredParam += "'" + item.upper() + "'"
            else:
                filteredParam += ", '" + item.upper() + "'"

    return filteredParam 
GeekyDad
  • 75
  • 1
  • 7
-1

Anecdotal note about applying the $ with python tuples in a SQL query:

It needs to be used in a %sql line, it cannot* be used in a %%sql code block. Use line continuation backslashes for readability on the query.

*as far as I can tell

Spencer
  • 131
  • 6