SQL: Getting the “first” item of a joined table

Question

Let’s say I want to show a list of companies, and also a first/random employee of that company.

I could do something like:

SELECT
  company.id,
  company.name,
  MIN(person.id) AS employee_person_id,
  MIN(person.name) AS employee_person_name
FROM company
LEFT OUTER JOIN person ON (person.company_id = company.id)
GROUP BY company.id;

But I think with my code above, MIN(person.id) and MIN(person.name) could give info about two different people, right?

Is there a better way of retrieving just a “first” (or random) employee and showing that person’s ID and name?

Does this answer your question? [Fetch the rows which have the Max value for a column for each distinct value of another column](https://stackoverflow.com/questions/121387/fetch-the-rows-which-have-the-max-value-for-a-column-for-each-distinct-value-of) — philipxy, Mar 15 '23 at 11:04

Mureinik · Answer 1 · 2023-03-15T11:04:05.320

2

I'd use the row_number window function to assign a numbering within each company, and then use that to query the first person:

SELECT c_id, c_name, p_id, p_name
FROM   (SELECT    c.id AS c_id,
                  c.name AS c_name,
                  p.id AS p_id,
                  p.name AS p_name,
                  ROW_NUMBER() OVER (PARTITION BY c.id ORDER BY p.id ASC) AS rn
        FROM      company c
        LEFT JOIN person p ON p.company_id = c.id) t
WHERE  rn = 1

edited Mar 15 '23 at 11:04

answered Mar 15 '23 at 09:36

Mureinik

297,002
52
306
350

1

The `DBMS_RANDOM.VALUE` [addition](https://stackoverflow.com/revisions/75742859/2) from [@UltraCommit](https://stackoverflow.com/users/297267/ultracommit) is from [Oracle](https://docs.oracle.com/database/121/TTPLP/d_random.htm#TTPLP040) - won't work in PostgreSQL. In Postgres, use [`random()`](https://www.postgresql.org/docs/current/functions-math.html#FUNCTIONS-MATH-RANDOM-TABLE). Either only makes sense if you require semi-reliable randomness. If you just don't care which one you get, it's faster, simpler and more convenient to *not use `order by`*, at all. – Zegarek Mar 15 '23 at 10:28
Looks like *not* ordering can save you some (sometimes half) the execution time, but it's still slowed down by collecting all matches before picking the one to return. I've updated [the performance tests](https://stackoverflow.com/a/75743855/5298879) – Zegarek Mar 15 '23 at 12:38

Zegarek · Accepted Answer · 2023-03-15T13:11:28.693

This is correct:

MIN(person.id) and MIN(person.name) could give info about two different people, right?

You can see that happening in the demo linked below.

GMB's distinct on is typically the most recommended, obvious choice, but it requires ordering, same as Mureinik's. Meanwhile, you can let each company just get any single person, without having to order them first: demo

SELECT company.id, company.name, person.id, person.name
FROM company LEFT JOIN LATERAL
(SELECT id,name FROM person WHERE company_id=company.id LIMIT 1) person
             ON true;

Even simpler if you don't have companies with no persons, or if you ignore such companies:

SELECT company.id, company.name, person.id,  person.name
FROM company, 
LATERAL (SELECT id,name FROM person WHERE company_id=company.id LIMIT 1) person;

It's "retrieving just a “first” (or random) employee" the convenient way: it takes whatever it happens to find first, without having to find and order all possible matches before picking one. Each company just fetches any one of their people, which seemed to be the idea.

Thanks to not doing the extra work, it's faster (16'000x on 300'000 row sample) and it scales in proportion to only the company, pretty much disregarding the person table's size and growth.

Locally I bumped up the `person` test sample to 10M rows and it didn't care, still taking below 0.1milliseconds, while the others jumped to 16 and 25 seconds. — Zegarek, Mar 15 '23 at 12:10
Yes this does match what the OP asked for (get *any* person from each company), and sure performs better than my deterministic query, +1. — GMB, Mar 15 '23 at 15:05

score 1 · Answer 3 · answered Mar 15 '23 at 09:36

This is how to get the correct name :

You will have to join your result with the person table by person.id = s.employee_person_id

select s.company_id, s.company_name, s.employee_person_id, p.name as employee_person_name
from person p
inner join (
    SELECT
      company.id as company_id,
      company.name as company_name,
      MIN(person.id) AS employee_person_id
    FROM company
    LEFT OUTER JOIN person ON (person.company_id = company.id)
    GROUP BY company.id
) as s on s.employee_person_id = p.id

score 1 · Answer 4 · answered Mar 15 '23 at 09:49

1

In Postgres, I would recommend distinct on:

SELECT distinct on (c.id)
  c.id,
  c.name,
  p.id AS employee_person_id,
  p.name AS employee_person_name
FROM company c
LEFT OUTER JOIN person p ON p.company_id = c.id
ORDER BY c.id, p.id

For each company, this brings the person with the smallest id; you control which person is picked with the ORDER BY clause (if you wanted the person with the greatest id, you would use ORDER BY c.id, p.id DESC).

answered Mar 15 '23 at 09:49

GMB

216,147
25
84
135

IMHO this approach is by far the best. – Stefanov.sm Mar 15 '23 at 10:40
@Stefanov.sm That would normally be my first choice, too. Still, I think that *retrieving just a “first” (or random) employee* means you can [skip ordering and gain some performance.](https://stackoverflow.com/a/75743855/5298879) – Zegarek Mar 15 '23 at 12:26
@Zegarek I cannot but agree. – Stefanov.sm Mar 15 '23 at 13:50

SQL: Getting the “first” item of a joined table

4 Answers4