0

I inserted 10k rows into my test DB. I'm doing some perf testing with some sample indexes. Currently, I created a clustered index on the primary key to see if exact-match filter would make a difference and it still takes > 800ms to retrieve this result:

Note: This is just high-level sample test, so all varchar columns are padded with > 100.

CREATE UNIQUE INDEX people_idx ON people (id);
CLUSTER people USING people_idx;

select * from people
where id = 2;
CREATE TABLE "people" (
  id SERIAL PRIMARY KEY,
  first_name varchar(255) default NULL,
  last_name varchar(255) default NULL,
  email varchar(255) default NULL,
  created_date varchar(255),
  phone varchar(100) default NULL,
  customer_id integer NULL
);

Am I thinking about this correctly?

I updated cluster index to add to last_name:

CREATE INDEX people_idx ON people (last_name);

EXPLAIN ANALYZE select * from people
where first_name = 'Sophia';

EXPLAIN ANALYZE select * from people
where last_name ILIKE 'd%';

The analyze output is as follows:

"Seq Scan on people  (cost=0.00..170.00 rows=416 width=76) (actual time=0.042..6.807 rows=414 loops=1)"
"  Filter: ((last_name)::text ~~* 'd%'::text)"
"  Rows Removed by Filter: 9586"
"Planning Time: 0.191 ms"
"Execution Time: 6.857 ms"
Ryan
  • 57
  • 1
  • 9
  • Please **edit your question** using the `edit` button just below the tags and include the layout of the table - a `CREATE TABLE` statement would be best. Also - you should execute the `CREATE INDEX` *before* the `CLUSTER` statement. Thanks. – Bob Jarvis - Слава Україні Jul 12 '20 at 20:31
  • ~Hi, thanks. I am adding.~ added – Ryan Jul 12 '20 at 20:32
  • @BobJarvis-ReinstateMonica I updated. I did indeed run the create before the cluster. – Ryan Jul 12 '20 at 20:37
  • Did you ANALYZE the table after adding the index? – Bob Jarvis - Слава Україні Jul 12 '20 at 20:38
  • @BobJarvis-ReinstateMonica I've added some explain output and updates. For an ILIKE on the indexed column, I should be filtering out less than 9586 rows, right? – Ryan Jul 12 '20 at 20:50
  • 3
    "*I created a clustered index*" - no, you didn't Postgres does not have "clustered indexes" –  Jul 12 '20 at 21:08
  • I don't know what data you have in your database, and I don't understand what you mean by "...should be filtering out less than 9586 rows...". The plan is what the plan is. It seems to me that the execution time shown in the plan of approx. 6.8 ms seems reasonable. If you're getting 800 ms round-trip time it seems that something else is affecting the overall time. [db<>fiddle here](https://dbfiddle.uk/?rdbms=postgres_12&fiddle=b827160e5bbd2e6713e2a3238e45e35d) – Bob Jarvis - Слава Україні Jul 12 '20 at 21:08
  • I guess my question is.. if I created a non-clustered index on last name, shouldn't there be less rows filtered out? With and without the index, I get the same explain plan. – Ryan Jul 12 '20 at 21:12
  • 2
    `ILIKE` will never use a B-Tree index, you would need [trigram](https://www.postgresql.org/docs/current/pgtrgm.html) index for that. For `LIKE` you would need to create the index using the [text_pattern_ops](https://stackoverflow.com/a/13452528) –  Jul 12 '20 at 21:13
  • 1
    Note your query only takes **6** milliseconds, not 800 as you claim –  Jul 12 '20 at 21:14
  • @a_horse_with_no_name do I need trigram if there is no leading wildcard and I do normal casing? – Ryan Jul 12 '20 at 21:16
  • EXPLAIN ANALYZE select * from people where last_name LIKE 'D%'; takes 1.55ms with and without the index. – Ryan Jul 12 '20 at 21:17
  • CREATE INDEX speed_test ON people USING btree (last_name text_pattern_ops) This seemed to enhance performance a bit. – Ryan Jul 12 '20 at 21:19
  • An index is basically another table and using one adds a lookup and overhead. What you have proven is that 10K rows is not enough for the planner to justify the overhead and that a sequential scan is equally fast. If you want this exercise to have any meaning increase the data size. I would start with an order of magnitude, 100K records. – Adrian Klaver Jul 12 '20 at 21:52

0 Answers0