16

Using the PeeWee ORM I have the following query:

query = DataModel.select()where(DataModel.field == "value")

Is there any way to convert query into a pandas DataFrame without iterating over all the values? I'm looking for a more "Pythonic" way of doing this.

MikeyE
  • 1,756
  • 1
  • 18
  • 37

3 Answers3

35

Assuming query is of type peewee.SelectQuery, you could do:

df = pd.DataFrame(list(query.dicts()))

EDIT: As Nicola points out below, you're now able to do pd.DataFrame(query.dicts()) directly.

Greg Reda
  • 1,744
  • 2
  • 13
  • 20
  • Brilliantly simple, and works like a charm! I won't tell you how long I spent trying to figure that one out, it's embarrassing. lol – MikeyE Mar 06 '17 at 10:45
  • 1
    for some reason `list(query.dicts())` is failing if there is a column that has the name of the table... has anybody experienced the same issue? – toto_tico May 09 '18 at 13:54
  • Don't use list(), the correct code is: `pd.DataFrame(query.dicts())` – Nicola Feb 08 '22 at 11:26
6

Just in case someone finds this useful, I was searching for the same conversion but in Python 3. Inspired by @toto_tico's previous answer, this is what I came up with:

import pandas
import peewee


def data_frame_from_peewee_query(query: peewee.Query) -> pandas.DataFrame:
    connection = query._database.connection()  # noqa
    sql, params = query.sql()
    return pandas.read_sql_query(sql, connection, params=params)

Checked with Python 3.9.6, pandas==1.3.2 and peewee==3.14.4, using peewee.SqliteDatabase.

franferrax
  • 91
  • 1
  • 4
4

The following is a more efficient way, because it avoids creating the list and then pass it to the pandas dataframe. It also has the side benefit of preserving the order of the columns:

df = pd.read_sql(query.sql()[0], database.connection())

You need direct access to the peewee database, for example, in the quickstart tutorial corresponds to:

db = SqliteDatabase('people.db')

Of course, you can also create your own connection to the database.

Drawback: you should be careful if you have repeated columns in the two tables, e.g. id columns would appear twice. So make sure to correct those before continuing.


If you are using a peewee proxy

import peewee as pw; 
database_proxy = pw.Proxy()

then the connection is here:

database_proxy.obj.connection()
toto_tico
  • 17,977
  • 9
  • 97
  • 116
  • 1
    How do you deal with the fact that `query.sql()[0]` returns a string with `%s` as placeholders? – LivingSilver94 Jan 14 '19 at 13:40
  • I would have to take a look at your exact query but in generally it could mean that there is a format-string python syntax, [check this](https://stackoverflow.com/questions/997797/what-does-s-mean-in-a-python-format-string) – toto_tico Jan 14 '19 at 16:47
  • 1
    @LivingSilver94 It isn't a complete solution but getting a cursor and using mogrify works for postgres: `cursor = db.cursor()` then `pd.read_sql(cursor.mogrify(*query.sql()), ...)` – user5915738 Sep 06 '21 at 03:09