0

Consider I have a table T with fields a,b,c,d with two indices: first on a,b,c fields and second on a,b,d fields. Types of a,b,c and d are integer. Both indices are almost the same (on production they both have about 2Gb size, they have the same creation time and the same statistics of usage, table overall have about 60 millions rows). I make two queries:

  1. select * from T where a=... and b=... and c=...;
  2. select * from T where a=... and b=... and d=...;

I expect that for the first query index on a,b,c fields is used and for the second index on a,b,d fields is used. However it's not the case and for both queries first index is used, but in second case with "filter"(I used expect analyze to gain this knowledge). For me such behavior is unacceptable, because in some circumstances number of entries in filter grows very fast and autovacuum/analyze (which actually helps the planner to use the right index) works too slow to prevent the unexpected latencies and downtime.

So my question is: how can I force postgresql not to use wrong index with filtering, but rather use the right index when all fields in query's 'where' and in that index match?

Kivan
  • 1,712
  • 1
  • 14
  • 30
  • Have took a look at: http://stackoverflow.com/questions/309786/how-do-i-force-postgres-to-use-a-particular-index ? – McNets Jan 19 '17 at 10:55
  • @McNets I read this, but didn't found anything which can help me (or maybe I missed something). For example disabling sequential or index scan don't help me. https://www.postgresql.org/docs/current/static/performance-tips.html this page didn't help either, since my use case is not described there and my problem is solely in index selection by planner. I'm thinking about cost tuning https://www.postgresql.org/docs/9.3/static/runtime-config-query.html but I don't know how to make it right in my case – Kivan Jan 19 '17 at 11:05
  • AFAIK there is none `use this index` in postgres. – McNets Jan 19 '17 at 11:14
  • The planner chooses the `a,b,c` index, because of its collected statistics it assumes that it will generate the results faster with that index. It is also possible, that for other `a,b,d` values the second index will be chosen. -- If you really don't have other columns in that table, you can try & drop your existing indexes, and add instead an `a,b,c,d` and an `a,b,d,c` index. That way, the planner will probably choose an index only scan, so it won't touch the table at all (which lowers the chance of filtered index-scans), but this way your indexes will be larger. – pozs Jan 19 '17 at 11:15
  • 3
    Please **[EDIT]** your question and add the `create table` statements for the tables in question (including all indexes), the complete query you are using and most importantly the execution plan generated using **`explain (analyze, verbose)`**. [**Formatted text**](http://stackoverflow.com/help/formatting) please, [no screen shots](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557) –  Jan 19 '17 at 11:23
  • Also: which Postgres version are you using? –  Jan 19 '17 at 11:24
  • @a_horse_with_no_name hello, actually it's related problem to http://stackoverflow.com/questions/41212678/postgresql-becomes-unresponsible-when-new-index-value-is-added . If you need more details, just proceed there, and here I want to leave everything as theoretical as possible – Kivan Jan 19 '17 at 11:29
  • 2
    You might want to read: http://use-the-index-luke.com/ –  Jan 19 '17 at 11:36
  • I read almost everything about postgresql indices on that resource and tried almost everything and nothing solved my problem. But I can miss something, that's why I'm asking for help here. So if you see something, which can help me, just let me know. Again I suppose that (as a last solution) tuning optimizer costs could help, however I haven't experience with it yet and do not know which parameter should be tuned and how in my case. – Kivan Jan 19 '17 at 13:06
  • @a_horse_with_no_name I use AWS RDS postgresql 9.3 – Kivan Jan 19 '17 at 13:12
  • Maybe the cardinality (how many different values exist) for `d` is too few to warrant using a different index than one that's already cached in memory (from query 1)? – Bohemian Jan 22 '17 at 21:39
  • @Bohemian actually no, there are millions different values for both c and d if you mean it – Kivan Jan 22 '17 at 22:05
  • Just d alone. Ie `select count(distinct d)` – Bohemian Jan 22 '17 at 22:33
  • @Bohemian 2.3 millions distinct values for d – Kivan Jan 23 '17 at 07:17

1 Answers1

0

Finally I found the solution. It's not perfect and I will not mark it as the best one, however it works and could help someone.

What I have actually done is I have changed the indices, instead of a,b,c and a,b,d indices now I'm having c,a,b and d,a,b.

One problem appeared: I needed an index on 'a', because some queries rely on it. However when I add index solely on 'a', the problem from the first post appears again (the index on 'a' is used when planner thinks that its cost is lower than cost of c,a,b or d,a,b). So I decided to add new field to the table which is a copy (has the same data) of a field 'a', let's call it 'a1', and I added the index on this field. Now when I need to filter contents of 'a' somehow instead I'm filtering on 'a1' field. It's weird, but I couldn't find another solution.

Kivan
  • 1,712
  • 1
  • 14
  • 30