2

Can anyone help with this? I have a query (see below)

dets = config.Se.query(db.A).filter(db.B.clientref != None, 
                                    db.A.id == db.B.clientref, 
                                    db.A.ignored == False).\
            order_by(desc(db.B.date_lts))

if _querya != ():
    tmp_dets = dets.filter(*_querya)
    dets = tmp_dets

d_results2 = dets.limit(300).all()
d_results = dets.all()
print len(d_results2), len(d_results)

With my test query, the print statement returns 4 5.

What am I doing wrong? I am only selecting from table A (even though I use table B with many to one relation to A to limit and sort the result set). Let us assume table A contains clients and table B contains orders including date_lts (last seen), and I want to order the result set so that those clients with most recent orders are displayed first.

_querya we can ignore. It allows setting additional filters to query and this works fine.

I would like to limit the result set to max 300 lines from table A to avoid the GUI choking in case the user sets too wide query parameters. I noticed, however, that a test query that should return 5 rows (confirmed from database) returned only four.

To debug this I renamed the original query result set to d_results2 and created d_results without limit(300). And I can see a difference. d_results2 gets limited to four.

What is going on? I assumed limit(300) would not do anything as the result set in its entirety is just five rows.

Hannu

Hannu
  • 11,685
  • 4
  • 35
  • 51
  • 1
    Try printing the queries for comparison. Add `print dets.limit(300)` and `print dets`. Add the results to the question. It'd seem there should be no difference as you've said, but just to be sure and all. – Ilja Everilä Jun 30 '17 at 10:48
  • Thanks. Did that and I already found the error. The query lacks distinct(). – Hannu Jun 30 '17 at 11:00
  • Self answer then? :) Still cannot see why adding limit would result in different result set in this case, so honestly interested. – Ilja Everilä Jun 30 '17 at 11:08
  • Will self answer as soon as I get it right. Will need some time. The problem is, this query now returns as many rows from table A for each entry as there are rows in table B pointing to that particular row. Simple distinct() did not do the trick but working on it. Why exactly this happens as the result set is of correct size nevertheless is beyond me but I can probably fix it by distinct or group_by. Stay on this channel... – Hannu Jun 30 '17 at 11:13
  • That's starting to sound like an [tag:greatest-n-per-group] type of problem, at least partly. I guess in the end you'd like to join the latest `B` of each `A` only for ordering. What DB are you using? – Ilja Everilä Jun 30 '17 at 11:20
  • This is postgresql. I am starting to consider splitting this to nested queries instead of combining it to a mammoth one. – Hannu Jun 30 '17 at 11:23
  • Which is what I did. Thanks for your help in pointing this out. It works now. Not elegant but in this case I prefer functionality over style. – Hannu Jun 30 '17 at 11:31
  • 1
    [This](https://stackoverflow.com/questions/44069023/sqlalchemy-show-only-latest-result-if-a-join-returns-multiple-results) might be helpful. You'd still need to sort the inner query according to date after the distinct on trick, as would seem that you're doing. – Ilja Everilä Jun 30 '17 at 11:32
  • Almost... Almost.... by following that the query as such works, but the sort order is wrong as it now sorts by id instead of date. I tried adding dets.order_by(db.B.date_lts.desc()).limit(300).all() but it didn't do anything. – Hannu Jun 30 '17 at 11:52
  • 1
    Yeah, meant that you'd then make a subquery out of the DISTINCT ON hack query and in the outer query fix the order by date. The order has to be id, date desc in the inner query so that `DISTINCT ON id` would then pick a single row per id etc. – Ilja Everilä Jun 30 '17 at 11:58
  • Finally got it right and posted an answer. Thanks for your help! – Hannu Jun 30 '17 at 12:45

1 Answers1

3

Thanks to Ilja in comments, this was a bit more complicated issue than I thought but finally managed to get it right. This is the query now if anyone is interested:

dets = config.Se.query(db.A, db.B).join(db.B).distinct(db.A.id).\
          filter(db.B.clientref != None, 
                 db.A.id == db.B.clientref, 
                 db.A.ignored == False).order_by(db.A.id)

if _querya != ():
    tmp_dets = dets.filter(*_querya)
    dets = tmp_dets

d_results = dets.from_self().order_by(db.B.date_lts.desc()).limit(300).all()

and this seems to work.

Hannu

Hannu
  • 11,685
  • 4
  • 35
  • 51
  • A couple notes, if you want the latest B of each A instead of unspecified, do `order_by(db.A.id, db.B.date_lts.desc())` in the inner query so that distinct on picks a controlled row. Also are the predicates `db.B.clientref != None,` still necessary after `join(db.B)`? Asking since unfamiliar with the models, but you used to have an implicit join with `db.A.id == db.B.clientref`, so perhaps the explicit join covers that. – Ilja Everilä Jun 30 '17 at 12:52
  • != None is there for historical reasons. I didn't pay much attention to it now as the problem was obviously elsewhere. It is a hard reference now and B.clientref is not nullable anymore, which makes the whole condition obsolete and it can be removed. – Hannu Jun 30 '17 at 12:55
  • Ok. Just remember to add the ordering by date as well in the inner query, or else you'll be picking a random A, B pair with the distinct on trickery. I'd like to thank you for pointing out `from_self()`. Had not really noticed it before and could've used it many times before. – Ilja Everilä Jun 30 '17 at 13:13
  • In this case it doesn't actually matter, which row there is from B. I just need one for certain columns, and the table B in general to provide sort order. from_self() is one of those things in SQLAlchemy you don't necessarily need but that make code more readable than their typical workarounds. – Hannu Jun 30 '17 at 13:18