pandasql
https://pypi.python.org/pypi/pandasql/0.1.0
Here is an example from the following blog http://blog.yhat.com/posts/pandasql-sql-for-pandas-dataframes.html . The inputs are two DataFrame
s meat
and births
: and this approach gives the projections
, filtering
, aggregation
and sorting
expected from sql.
@maxpower did mention this package is buggy: so let's see.. At least the code from the blog and shown below works fine.
pysqldf = lambda q: sqldf(q, globals())
q = """
SELECT
m.date
, m.beef
, b.births
FROM
meat m
LEFT JOIN
births b
ON m.date = b.date
WHERE
m.date > '1974-12-31';
"""
meat = load_meat()
births = load_births()
df = pysqldf(q)
The output is a pandas DataFrame
as desired.
It is working great for my particular use case (evaluating us crimes)
odf = pysqldf("select %s from df where sweapons > 10 order by sweapons desc limit 10" %scols)
p('odf\n', odf)
odf
: SMURDER SRAPE SROBBERY SAGASSLT SOTHASLT SVANDLSM SWEAPONS
0 0 0 0 1 1 10 54
1 0 0 0 0 1 0 52
2 0 0 0 0 1 0 46
3 0 0 0 0 1 0 43
4 0 0 0 0 1 0 33
5 1 0 2 16 28 4 32
6 0 0 0 7 17 4 30
7 0 0 0 0 1 0 29
8 0 0 0 7 16 3 29
9 0 0 0 1 0 5 28
Update I have done a bunch of stuff with pandasql
now: calculated fields, limits, aliases, cascaded dataframes.. it is just so productive.
Another update (3 yrs later) This works but warning it is very slow (seconds vs milliseconds) –