I'm having serious performance issues when using a LEFT OUTER JOIN
and trying to use a column in the right table in Postgres. I have a table of users and a table with online_users that lists user ids that are online in my website. Both tables have indexes in the user ids. I need to run a select on the users table and list first users that are online, followed by users who aren't online. So my select is:
SELECT *
FROM users
LEFT JOIN online_users ON (users.id = online_users.usr_id)
ORDER BY online_users.online_date
I have indexes on users.id
, online_users.usr_id
and online_users.online_date
, but for some reason, when I run ANALYZE
on the query, the index for online_users.online_date
isn't used by Postgres and the full scan ruins the query's performance.
Is there any way to optimize this query without changing the tables' structure (these tables are replicated, so changing the structure will require a major refactoring of our project).
Postgre version is 9.3
Below is the Explain Analyze:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=2589440.94..2595456.84 rows=2406361 width=506) (actual time=18635.686..25775.334 rows=2239030 loops=1)
Sort Key: usuarios_online.datamessenger
Sort Method: external merge Disk: 512424kB
-> Hash Left Join (cost=219.73..130113.66 rows=2406361 width=506) (actual time=0.723..12388.266 rows=2239030 loops=1)
Hash Cond: (usuarios.id = usuarios_online.id_usr)
-> Seq Scan on usuarios (cost=0.00..108832.61 rows=2406361 width=494) (actual time=0.009..7328.191 rows=2238984 loops=1)
-> Hash (cost=212.66..212.66 rows=566 width=12) (actual time=0.704..0.704 rows=572 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 27kB
-> Seq Scan on usuarios_online (cost=0.00..212.66 rows=566 width=12) (actual time=0.079..0.555 rows=572 loops=1)
Total runtime: 28519.611 ms
(10 rows)