1

I am trying to join 3 tables, (Movie, role, Actor). My code is executing without errors but when I try to print the data I can only access information on the first table.

role = db.Table('role',
                 db.Column('movie_id', db.Integer, db.ForeignKey('movie.id')),
                 db.Column('actor_id', db.Integer, db.ForeignKey('actor.id')))


class Movie(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    title = db.Column(db.String(20), nullable=False)
    genre = db.Column(db.Text)
    year = db.Column(db.Integer)
    characters = db.relationship('Actor',lazy='dynamic',
                         secondary=role,
                         backref=db.backref('movies', lazy='dynamic')) #  updated


class Actor(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    first_name = db.Column(db.String(200), nullable=False)
    picture = db.Column(db.String(200), nullable=False)
    age = db.Column(db.Integer)


query = Movie.query.join(role).join(Actor).filter_by(Movie.year==1974)

With the query shown I wouldn't be able to access information about the actors.

Edit: I want to be able to filter, limit, and sort the joined actors table. Example:

query = Movie.query.join(role).join(Actor).order_by(Actor.age.desc()).limit(4).filter_by(Movie.year==1974)

update: I am trying to translate the raw sql query provided by snakecharmerb into sqlalchemy and I am running into roadblocks. This is what I have so far:

query = select(select(Movie.title, Actor.first_name, Actor.age, func.row_number().over(partition_by=role.c.movie_id, order_by=Actor.age.desc())).label("rn")
                   {???????}.label("m")
                   .join(role, (role.c.movie_id == movie.id))
                   .join(Actor, (role.c.actor_id == actor.id)))
                   .where("rn <= 4")

I've tried using .select_from() in place of {???????} but nothing I've tried putting in the .select_from() has worked.

Update 2:

I think I did it, I used this query:

sub_query = db.session.query(Movie.title, Actor.first_name, Actor.age,
func.row_number().over(partition_by=role.c.movie_id).label('rn'))\
    .join(role, (role.c.movie_id == movie.id))
    .join(Actor, (role.c.actor_id == actor.id)).subquery()
query = db.session.query(sub_query).filter(text('rn <= 4'))

But I have (hopefully) one last problem. The data that is returned is a flat list

[
    ('movie name 1', "actor name 1", 1),
    ('movie name 1', 'actor name 2', 2),
    ('movie name 1', 'actor name 3', 3),
    ('movie name 1', 'actor name 4', 4),
    ('movie name 2', 'actor name 1', 1),
    ('movie name 2', 'actor name 7', 2),
    ('movie name 2', 'actor name 4', 3),
    ('movie name 2', 'actor name 2', 4),
    ('movie name 3', 'actor name 8', 1),
    ('movie name 3', 'actor name 9', 2)
]

And I was hoping I would be able to get something like this

[
    ('movie name 1', ["actor name 1",
                      'actor name 2',
                      'actor name 3',
                      'actor name 4']),
    ('movie name 2', ['actor name 1',
                      'actor name 7',
                      'actor name 4',
                      'actor name 2']),
    ('movie name 3', ['actor name 8',
                      'actor name 9']),
]

I think I can do this using a .group_by() but I haven't been able to get it to work.

Parzival
  • 2,051
  • 4
  • 14
  • 32
person
  • 87
  • 1
  • 10
  • [This answer](https://stackoverflow.com/a/44817668/5320906) almost gets you there, except for the `LIMIT` clause. – snakecharmerb Jul 05 '20 at 17:34
  • @snakecharmerb Is it not possible to do the filtering, limiting, and sorting in the query (rather than the model like I tried to do in the example)? Performance wise, is it worth it to continue trying to limit the joined table to 4 results (when it can theoretically come up with 1000s)? – person Jul 06 '20 at 00:51
  • @snakecharmerb I am using postgres. – person Jul 06 '20 at 20:17

1 Answers1

0

The realtionship between Movie and Actor needs to be redefined so that it loads dynamically on the Movie side (I also renamed the backref, so you can have Actor.movies):

characters = db.relationship('Actor',
                             lazy='dynamic',
                             secondary=role,
                             backref=db.backref('movies', lazy='dynamic'))

Once this is done, Movie.characters can be queried for each movie (this example is pure sqlalchemy, but converting to flask-sqlalchemy should not be difficult):

from sqlalchemy import orm

query = session.query(Movie).filter(Movie.year == 1974)
for movie in query:
    print(movie.title)
    for actor in movie.characters.order_by(Actor.age.desc().limit(4):
        print(actor.first_name, actor.age)
    print()

In terms of efficiency, limiting the number of actors returned per movie is good, but using dynamic loading means that we generate one query to get the movies and then one query to get each movie's characters (this is the N + 1 problem). This isn't very efficient either.

Using pure SQL, the required data could be retrieved with this query:

SELECT * FROM (SELECT m.title, a.first_name, a.age, ROW_NUMBER() OVER (PARTITION BY r.movie_id ORDER BY a.age DESC) AS rn
FROM movies m 
  JOIN role r ON r.movie_id = m.id
  JOIN actors a ON r.actor_id = a.id 
WHERE m.year = 1974) 
dummy WHERE rn <= 4;
snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
  • Is it possible to limit the amount of actors that are returned using this method? I seem to only be able to limit the amount of movies. – person Jul 05 '20 at 15:07
  • I've edited my answer with something that works. I haven't managed to convert that SQL query into sqlalchemy yet, but may try again later. – snakecharmerb Jul 07 '20 at 08:00