1

I want to read all of the tables contained in a database into pandas data frames. This answer does what I want to accomplish, but I'd like to use the DBAPI syntax with the ? instead of the %s, per the documentation. However, I ran into an error. I thought this answer may address the problem, but I'm now posting my own question because I can't figure it out.

Minimal example

import pandas as pd
import sqlite3

pd.__version__  # 0.19.1
sqlite3.version  # 2.6.0

excon = sqlite3.connect('example.db')
c = excon.cursor()
c.execute('''CREATE TABLE stocks
         (date text, trans text, symbol text, qty real, price real)''')
c.execute("INSERT INTO stocks VALUES ('2006-01-05', 'BUY', 'RHAT', 100, 35.14)")
c.execute('''CREATE TABLE bonds
         (date text, trans text, symbol text, qty real, price real)''')
c.execute("INSERT INTO bonds VALUES ('2015-01-01', 'BUY', 'RSOCK', 90, 23.11)")

data = pd.read_sql_query('SELECT * FROM stocks', excon)
# >>> data
#          date trans symbol    qty  price
# 0  2006-01-05   BUY   RHAT  100.0  35.14

But when I include a ? or a (?) as below, I get the error message pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT * FROM (?)': near "?": syntax error.

Problem code

c.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = c.fetchall()
# >>> tables
# [('stocks',), ('bonds',)]
table = tables[0]

data = pd.read_sql_query("SELECT * FROM ?", excon, params=table)

It's probably something trivial that I'm missing, but I'm not seeing it!

3novak
  • 2,506
  • 1
  • 17
  • 28

2 Answers2

2

The problem is that you're trying to use parameter substitution for a table name, which is not possible. There's an issue on GitHub that discusses this. The relevant part is at the very end of the thread, in a comment by @jorisvandenbossche:

Parameter substitution is not possible for the table name AFAIK.

The thing is, in sql there is often a difference between string quoting, and variable quoting (see eg https://sqlite.org/lang_keywords.html the difference in quoting between string and identifier). So you are filling in a string, which is for sql something else as a variable name (in this case a table name).

Oriol Mirosa
  • 2,756
  • 1
  • 13
  • 15
0

Parameter substitution is essential to prevent SQL Injection from unsafe user-entered values.

In this particular example you are sourcing table names directly from the database's own metadata, which is already safe, so it's OK to just use normal string formatting to construct the query, but still good to wrap the table names in quotes.

If you are sourcing user-entered table names, you can also parameterize them first before using them in your normal python string formatting.

e.g.

# assume this is user-entered:
table = '; select * from members; DROP members --'

c.execute("SELECT name FROM sqlite_master WHERE type='table' and name = ?;", excon, params=table )
tables = c.fetchall()

In this case the user has entered some malicious input intended to cause havoc, and the parameterized query will cleanse it and the query will return no rows. If the user entered a clean table e.g. table = 'stocks' then the above query would return that same name back to you, through the wash, and it is now safe.

Then it is fine to continue with normal python string formatting, in this case using f-string style:

table = tables[0]

data = pd.read_sql_query(f"""SELECT * FROM "{table}" ;""", excon)

Referring back to your original example, my first step above is entirely unnecessary. I just provided it for context. It is unnecessary, because there is no user input so you could just do something like this to get a dictionary of dataframes for every table.

c.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = c.fetchall()
# >>> tables
# [('stocks',), ('bonds',)]


dfs = dict()
for t in tables:
    dfs[t] = pd.read_sql_query(f"""SELECT * FROM "{t}" ;""", excon)

Then you can fetch the dataframe from the dictionary using the tablename as the key.

Davos
  • 5,066
  • 42
  • 66