19

How can I structure this sqlalchemy query so that it does the right thing?

I've given everything I can think of an alias, but I'm still getting:

ProgrammingError: (psycopg2.ProgrammingError) subquery in FROM must have an alias
LINE 4: FROM (SELECT foo.id AS foo_id, foo.version AS ...

Also, as IMSoP pointed out, it seems to be trying to turn it into a cross join, but I just want it to join a table with a group by subquery on that same table.

Here is the sqlalchemy:

(Note: I've rewritten it to be a standalone file that is as complete as possible and can be run from a python shell)

from sqlalchemy import create_engine, func, select
from sqlalchemy import Column, BigInteger, DateTime, Integer, String, SmallInteger
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

engine = create_engine('postgresql://postgres:#######@localhost:5435/foo1234')
session = sessionmaker()
session.configure(bind=engine)
session = session()

Base = declarative_base()

class Foo(Base):
     __tablename__ = 'foo'
     __table_args__ = {'schema': 'public'}
     id = Column('id', BigInteger, primary_key=True)
     time = Column('time', DateTime(timezone=True))
     version = Column('version', String)
     revision = Column('revision', SmallInteger)

foo_max_time_q = select([
     func.max(Foo.time).label('foo_max_time'),
     Foo.id.label('foo_id')
 ]).group_by(Foo.id
 ).alias('foo_max_time_q')

foo_q = select([
    Foo.id.label('foo_id'),
    Foo.version.label('foo_version'),
    Foo.revision.label('foo_revision'),
    foo_max_time_q.c.foo_max_time.label('foo_max_time')
]).join(foo_max_time_q, foo_max_time_q.c.foo_id == Foo.id
).alias('foo_q')

thing = session.query(foo_q).all()
print thing

generated sql:

SELECT foo_id AS foo_id,
    foo_version AS foo_version,
    foo_revision AS foo_revision,
    foo_max_time AS foo_max_time,
    foo_max_time_q.foo_max_time AS foo_max_time_q_foo_max_time,
    foo_max_time_q.foo_id AS foo_max_time_q_foo_id
    FROM (SELECT id AS foo_id,
        version AS foo_version,
        revision AS foo_revision,
        foo_max_time_q.foo_max_time AS foo_max_time
        FROM (SELECT max(time) AS foo_max_time,
            id AS foo_id GROUP BY id
        ) AS foo_max_time_q)
    JOIN (SELECT max(time) AS foo_max_time,
            id AS foo_id GROUP BY id
    ) AS foo_max_time_q
    ON foo_max_time_q.foo_id = id

and here is the toy table:

CREATE TABLE foo (
id bigint ,
time timestamp with time zone,
version character varying(32),
revision smallint
);

The SQL was I expecting to get (desired SQL) would be something like this:

SELECT foo.id AS foo_id,
       foo.version AS foo_version,
       foo.revision AS foo_revision,
       foo_max_time_q.foo_max_time AS foo_max_time
       FROM foo
       JOIN (SELECT max(time) AS foo_max_time,
            id AS foo_id GROUP BY id
            ) AS foo_max_time_q
        ON foo_max_time_q.foo_id = foo.id

Final note: I'm hoping to get an answer using select() instead of session.query() if possible. Thank you

slashdottir
  • 7,835
  • 7
  • 55
  • 71
  • That SQL seems to be incomplete / incorrect in some way - it has more `)` than `(`. However, the sub-query I can see missing an alias begins on line 7 - `FROM (SELECT foo.id AS foo_id,` - and ends on line 17 - a single `)`. – IMSoP Jan 15 '16 at 15:35
  • Looking at it, I think that sub-query has been created by SQLAlchemy because it's interpreted your query as having an implied cross-join between `foo` and `foo_max_time_q` (`FROM foo, (...) as foo_max_time_q`) *as well as* your explicit join specification (`JOIN (...) AS foo_max_time_q ON foo_max_time_q.foo_id = foo.id`). – IMSoP Jan 15 '16 at 15:42
  • @IMSoP: That is how it is generated though. That is the whole problem – slashdottir Jan 15 '16 at 15:44
  • Well, it's not the *whole* problem. "Getting rid of the error" should not be your aim - if you manage to add an alias on line 17, you will get an error on the final line instead (`) AS foo_q,`); fix that, and you may well find the query gives incorrect results due to the issue in my second comment. Instead, "generating the required SQL" should be your aim. If you write the SQL by hand, how does the generated SQL differ? Does this give you any clues? Can you produce any simpler queries that either exhibit the same problem, or look like the components of your required SQL? – IMSoP Jan 15 '16 at 15:56
  • If I write SQL by hand, there's nothing to generate. The SQL is generated by SQLAlchemy as part of its magic. – slashdottir Jan 15 '16 at 17:49
  • This is the simplest toy query I could write that generates the problem. – slashdottir Jan 15 '16 at 17:50
  • I'm interested in a SQLAlchemy solution - not a SQL solution. Thank you – slashdottir Jan 15 '16 at 17:50
  • @alecxe Added to the end. Thank you! – slashdottir Jan 17 '16 at 23:54

2 Answers2

31

You are almost there. Make a "selectable" subquery and join it with the main query via join():

foo_max_time_q = select([func.max(Foo.time).label('foo_max_time'),
                         Foo.id.label('foo_id')
                        ]).group_by(Foo.id
                         ).alias("foo_max_time_q")

foo_q = session.query(
          Foo.id.label('foo_id'),
          Foo.version.label('foo_version'),
          Foo.revision.label('foo_revision'),
          foo_max_time_q.c.foo_max_time.label('foo_max_time')
                ).join(foo_max_time_q, 
                       foo_max_time_q.c.foo_id == Foo.id)

print(foo_q.__str__())

Prints (prettified manually):

SELECT 
    foo.id AS foo_id, 
    foo.version AS foo_version, 
    foo.revision AS foo_revision, 
    foo_max_time_q.foo_max_time AS foo_max_time 
FROM 
    foo 
JOIN 
    (SELECT 
         max(foo.time) AS foo_max_time, 
         foo.id AS foo_id 
     FROM 
         foo 
     GROUP BY foo.id) AS foo_max_time_q 
ON 
    foo_max_time_q.foo_id = foo.id

The complete working code is available in this gist.

slashdottir
  • 7,835
  • 7
  • 55
  • 71
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • This is so cool. Thank you. It still relies on session.query, but I'll take it. I'm having trouble understanding why it's so different. What I'd like to do is declare the whole foo_max_time column as a separate object that I can just drop in. – slashdottir Jan 18 '16 at 18:14
  • So the only difference is that the foo_q is done with session.query instead of select? That seems to me a very subtle change that isn't intuitive to me right now. – slashdottir Jan 18 '16 at 18:53
  • Okay, I've tested it and it's working. This is going to improve performance for me. Thanks so much – slashdottir Jan 18 '16 at 18:55
  • @slashdottir yeah, I've tried so many different things, but eventually came back very close to your initial version. I'm pretty sure there are several ways to produce the same query with SQLAlchemy. Glad to help! – alecxe Jan 18 '16 at 19:29
  • This works for me also. Albeit this is in no way intuitive and could be "hidden" by sqlalchemy by accepting and understanding query objects in join clause in a transparent manner. (Or at very least not create malformed SQL but print out a helpful message). Also note that you have to use **[]** notation in select, while not in query. – Ric Hard Jul 23 '18 at 10:19
1

Cause

subquery in FROM must have an alias

This error means the subquery (on which we're trying to perform a join) has no alias.
Even if we .alias('t') it just to satisfy this requirement, we will then get the next error:

missing FROM-clause entry for table "foo"

That's because the join on clause (... == Foo.id) is not familiar with Foo.
It only knows the "left" and "right" tables: t (the subquery) and foo_max_time_q.


Solution

Instead, select_from a join of Foo and foo_max_time_q.

Method 1

Replace .join(B, on_clause) with .select_from(B.join(A, on_clause):

]).join(foo_max_time_q, foo_max_time_q.c.foo_id == Foo.id
]).select_from(foo_max_time_q.join(Foo, foo_max_time_q.c.foo_id == Foo.id)

This works here because A INNER JOIN B is equivalent to B INNER JOIN A.

Method 2

To preserve the order of joined tables:

from sqlalchemy import join

and replace .join(B, on_clause) with .select_from(join(A, B, on_clause)):

]).join(foo_max_time_q, foo_max_time_q.c.foo_id == Foo.id
]).select_from(join(Foo, foo_max_time_q, foo_max_time_q.c.foo_id == Foo.id)

Alternatives to session.query() can be found here.

EliadL
  • 6,230
  • 2
  • 26
  • 43