How to join a table with a row in another table having the highest id?

Question

I have two Postgres tables: 'user' with unique rows, and 'address' where a single user can have multiple addresses. I want to join specific 'user' row with one row from 'address' where the address.id is the highest max(id).

I wrote the following query which works OK but I suspect that when there are many addresses for one user, this query will produce a really large intermediate aggregate before it returns the the final results. Is there a better way to write this query?

select "user".id, 
       "user".username,     
       "user".trader_id, 
       address.id,                                             
       address.street, 
       address.trader_id 
    from "user", address 
  where "user".id = 6 
      and  address.trader_id = "user".trader_id                         
 order by address.id desc
 limit 1;

You could try to join with a sub query in the from clause which would give the latest address. Something like this below: (select trader_id, max(id) from address group by trader_id) — user2773289, Aug 03 '23 at 02:43
@user2773289 That would drop the `street` column that the query needs — Patrick, Aug 03 '23 at 03:58
@Patrick I am not suggesting that you remove the address table from the from-clause, just use the query I suggested to find the latest address. — user2773289, Aug 03 '23 at 19:18

Erwin Brandstetter · Accepted Answer · 2023-08-03T06:09:15.370

Your query is pretty good already for getting a single user. Definitely much faster than running a window function over the whole address table in a subquery.
Your query, only slightly simplified with a USING clause:

SELECT trader_id, u.id AS user_id, u.username, a.id AS adr_id, a.street
FROM   "user"  u
JOIN   address a USING (trader_id)
WHERE  u.id = 6
ORDER  BY a.id DESC
LIMIT  1;

A multicolumn index on address(trader_id, id) will give you optimal performance. Plus, obviously, an index on "user"(id). (No "intermediate aggregate" like you apprehended with LIMIT 1!) See:

Alternatively, the same technique in a LATERAL subquery. Works for retrieving one or more users:

SELECT u.trader_id, u.id AS user_id, u.username, a.*
FROM   "user"  u
LEFT   JOIN LATERAL (
   SELECT a.id AS adr_id, a.street
   FROM   address a
   WHERE  a.trader_id = u.trader_id
   ORDER  BY a.id DESC
   LIMIT  1
   ) a ON true
WHERE  u.id = 6;  -- or for more than just the one

About LATERAL subqueries:

What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?

Also using LEFT JOIN to preserve users without any address.

If you are going to use a window function, use row_number() rather than rank(). Do it in a LATERAL subquery while only retrieving a single (or few) user(s), to only involve relevant rows. And, unless you run Postgres 16 or later, add the frame clause ROWS UNBOUNDED PRECEDING for performance:

SELECT u.trader_id, u.id AS user_id, u.username, a.*
FROM   "user"  u
LEFT   JOIN LATERAL (
   SELECT id AS adr_id, street
        , row_number() OVER (ORDER BY id DESC ROWS UNBOUNDED PRECEDING) AS rn
   FROM   address a
   WHERE  a.trader_id = u.trader_id
   ) a ON a.rn = 1
WHERE  u.id = 6;  -- or for more than one user

Why ROWS UNBOUNDED PRECEDING? See:

Or use DISTINCT ON:

SELECT DISTINCT ON (traider_id)
       trader_id, u.id AS user_id, u.username, a.id AS adr_id, a.street
FROM   "user"  u
JOIN   address a USING (trader_id)  -- or LEFT JOIN?
WHERE  u.id = 6
ORDER  BY trader_id, a.id DESC;

See:

Select first row in each GROUP BY group?

__

Aside: Rather don't use reserved words like "user" as identifier.

score 1 · Answer 2 · answered Aug 03 '23 at 03:57

Your query should work just fine and be performant, especially if you use appropriate indexes, such as on trader_id for both tables. (There are certain improvements that can be made, such as dropping the second trader_id column and using an alias for one or both id columns.)

The query will fail, however, if you want to get the information for multiple users, because only a single row will be returned due to the LIMIT 1 clause. A more general solution then, where you can get the desired information for multiple "user".ids at a time would use a window function:

SELECT u.id AS user_id, 
       u.username,     
       trader_id, 
       a.id AS address_id,                                             
       a.street
FROM "user" u
JOIN (
    SELECT id, street, trader_id,
           rank() OVER (PARTITION BY trader_id ORDER BY id) AS rank
    FROM address) a USING (trader_id)
WHERE a.rank = 1
ORDER BY u.id;

In terms of performance, you should focus on indexes rather than query structure - the query planner will deal with the latter.

score 1 · Answer 3 · answered Aug 03 '23 at 04:54

You can skip using order by and limit in your query by using multicolumn subquery in order to get max id from address table of all/any user.

step 1: Write a get max address id of all user from address table .

select sa.trader_id,max(ss.id) as id from address as sa  group by 1

Step 2: put the above query inside another query to fetch necessary column of address table.

        select 
            ss.* 
        from address as ss 
        where (ss.trader_id ,ss.id) in (select sa.trader_id,max(ss.id) as id from address as sa  group by 1)

Step 3: Join the above query with "user" table. You can get any users latest address id from this using where condition.

select "user".id, 
           "user".username,     
           "user".trader_id, 
           t1.id,                                             
           t1.street, 
           t1.trader_id 
        from "user" 
        join (
                select 
                    ss.* 
                from address as ss 
                where (ss.trader_id ,ss.id) in (select sa.trader_id,max(ss.id) as id from address as sa  group by 1)
            )t1 on "user".trader_id =t1.trader_id
              where "user".id = 6;

score 1 · Answer 4 · answered Aug 03 '23 at 07:22

To explore performance questions, it's useful to create a test bed capable of exposing differences. The following commands establish an environment with a sufficient number of rows to demonstrate the relative performance of different queries (adjust the parameters in p to model different data characteristics):

CREATE TABLE users (
  id integer PRIMARY KEY,
  username text,
  trader_id integer);

CREATE TABLE addresses (
  id integer PRIMARY KEY,
  street text,
  trader_id integer);

WITH p(num_users, num_traders, mean_trader_addresses) AS (
  VALUES (20000, 10000, 100)
),
new_users AS (
  INSERT INTO users (id, username, trader_id)
  SELECT s.n, gen_random_uuid()::text, (random() * p.num_traders)::integer + 1
    FROM p
    CROSS JOIN LATERAL generate_series(1, p.num_users) s(n))
INSERT INTO addresses(id, street, trader_id)
SELECT s.n, gen_random_uuid()::text, (random() * p.num_traders)::integer % p.num_traders + 1
  FROM p
  CROSS JOIN LATERAL generate_series(1, p.num_traders * p.mean_trader_addresses) s(n);

CREATE INDEX users_by_trader_id ON users(trader_id);

CREATE INDEX addresses_by_trader_id ON addresses(trader_id);

Several queries, each of which returns the OP's desired results, are listed below along with their mean completion times when run on my laptop.

As Erwin Brandstetter pointed out, the following performs poorly when querying for one user because the subquery has to process all addresses even though only those associated with a single trader are of interest. This query performs better than the others when querying for all users.

SELECT users.*, address.id, address.street, address.trader_id
  FROM users
  LEFT JOIN (SELECT addresses.id, addresses.street, addresses.trader_id,
               rank() OVER (PARTITION BY trader_id ORDER BY addresses.id DESC) AS rn
          FROM addresses) address
    ON users.trader_id = address.trader_id
       AND address.rn = 1
  WHERE users.id = 42;

3923.033 ms (3654.100 ms for all users)

This next version of the query performs significantly better than the prior one when querying for a single user; however, its performance when querying for all users is demonstrably worse.

EXPLAIN (ANALYZE, buffers, VERBOSE)
SELECT DISTINCT ON (users.id) users.*, addresses.*
  FROM users
  LEFT JOIN addresses
    ON users.id = addresses.trader_id
  WHERE users.id = 42
  ORDER BY users.id, addresses.id DESC;

0.361 ms (5072.365 ms for all users)

This third version's performance is statistically identical to the previous one for single user queries and is only slightly worse than the first version's when querying all users.

SELECT users.*, address.*
  FROM users
  LEFT JOIN LATERAL (SELECT DISTINCT ON (addresses.trader_id) addresses.id, addresses.street, addresses.trader_id
                        FROM addresses
                        WHERE addresses.trader_id = users.id
                        ORDER BY addresses.trader_id, addresses.id DESC) address
    ON TRUE
  WHERE users.id = 42;

0.356 ms (3963.354 ms for all users)

Additional insite might be gained with a deeper analysis of the output from EXPLAIN (ANALYZE, BUFFERS, VERBOSE).

How to join a table with a row in another table having the highest id?

4 Answers4