0

I'm building a language learning platform, here are some tables:

create table users
(  
    id                  bigserial primary key,
    ...
)

create table users_languages
(
    id                   bigserial primary key,
    user_id              bigint    not null
        constraint users_languages_user_id_foreign references users,
    level                varchar(255)         not null,
    lang_code            varchar(10)          not null,
    ...
)

So the users_languages keeps all the langs a user knows, level could be NATIVE or LEARN, lang_code is the ISO code.

I'm building a recommended search feature for a current user, depending on the languages he knows, so the result must be ordered by the following rules:

  1. other user NATIVE lang = current user LEARN lang AND other user LEARN lang = current user NATIVE lang
  2. other user NATIVE lang = current user LEARN lang
  3. other user has at least one of current user langs

Here is what I figured out so far:

           nativeCode := current user native lang_code
           langCodes := current user learn lang_codes


            WITH user_lang_priority AS NOT MATERIALIZED (
                SELECT l.user_id, MIN(CASE 
                    WHEN l.level = 'NATIVE' AND l.lang_code IN(:langCodes) THEN
                      CASE
                        WHEN EXISTS (
                           SELECT ll.id FROM users_languages ll
                           WHERE ll.level != 'NATIVE'
                           AND ll.lang_code = :nativeCode
                           AND ll.user_id = l.user_id
                        ) THEN 1 ELSE 2
                      END
                    WHEN l.lang_code IN(:langCodes) THEN 3
                END) AS priority
                FROM users_languages l
                GROUP BY l.user_id
            )
            SELECT u.*
            FROM users u
            INNER JOIN user_lang_priority lp ON u.id = lp.user_id
            GROUP BY u.id
            ORDER BY lp.priority ASC, u.id DESC

The query seems to be returning the correct results, but it takes up to 2 seconds on a table with about 30k rows. Is there a way to speed it up?

UPD: the query plan

Sort  (cost=777435.38..777508.24 rows=29145 width=762)
"  Sort Key: (min(CASE WHEN (((l.level)::text = 'NATIVE'::text) AND ((l.lang_code)::text = ANY ('{fin,fre,ger}'::text[]))) THEN CASE WHEN (hashed SubPlan 2) THEN 1 ELSE 2 END WHEN ((l.lang_code)::text = ANY ('{rus,fre,ger}'::text[])) THEN 3 ELSE NULL::integer END)), u.id DESC"
  ->  Hash Join  (cost=1677.14..775274.14 rows=29145 width=762)
        Hash Cond: (l.user_id = u.id)
        ->  GroupAggregate  (cost=0.29..773229.32 rows=29145 width=12)
              Group Key: l.user_id
              ->  Index Scan using users_languages_user_id_index on users_languages l  (cost=0.29..3005.29 rows=84970 width=19)
              SubPlan 2
                ->  Bitmap Heap Scan on users_languages ll  (cost=299.15..1487.36 rows=14990 width=8)
                      Recheck Cond: ((lang_code)::text = 'eng'::text)
                      Filter: ((level)::text <> 'NATIVE'::text)
                      ->  Bitmap Index Scan on users_languages_lang_code_index  (cost=0.00..295.40 rows=22814 width=0)
                            Index Cond: ((lang_code)::text = 'eng'::text)
        ->  Hash  (cost=1253.60..1253.60 rows=33860 width=758)
              ->  Seq Scan on users u  (cost=0.00..1253.60 rows=33860 width=758)
Majesty
  • 2,097
  • 5
  • 24
  • 55
  • Did you try indexing on table `users_languages `? – Heinz Siahaan Sep 01 '23 at 07:07
  • 2
    don't group the final query, appears redundant - but the first tool of any select query optimisation is the explain plan - have you got that? add it to the question – Paul Maxwell Sep 01 '23 at 07:21
  • @HeinzSiahaan There are no indexes as there are not so many records and it and the selectivity would be low, I don't think postgres will ever decide to make use of this index – Majesty Sep 01 '23 at 07:21

0 Answers0