2

I'm using the Kaminari gem to paginate a query on a large table (~1.5MM rows). While fetching the actual results pages is quite quick (~20ms), kaminari's added SELECT COUNT(*) WHERE .... is excruciatingly slow, and adds several extra seconds to the execution time.

Is there a way to approximate the number of results instead?

Avishai
  • 4,512
  • 4
  • 41
  • 67
  • 1
    Do you want to know both how to speed it up and how to combine that with kaminari? – Shadwell Feb 17 '14 at 21:28
  • I don't care about the _exact_ number of results, I just don't want to have to do `COUNT(*)` on a potentially very large result set. – Avishai Feb 17 '14 at 21:40
  • 1
    I don't see how the purported duplicate is actually a duplicate. As @Shadwell writes, the question is how to speed this up **with Kaminari**. The linked "duplicate" is one piece of the answer, but the second half remains: how can Kaminari's behavior be changed to use it? – Dogweather Jun 17 '15 at 06:08

1 Answers1

4

Quick estimate for whole table

For a very quick estimate for the whole table:

Your example hints at addresses. Say we have a table called adr in the schema public:

SELECT reltuples FROM pg_class WHERE oid = 'public.adr'::regclass;

More details in this related answer:
How do I speed up counting rows in a PostgreSQL table?

Count with condition(s)

For a count with a condition, Postgres can use indexes to make it faster. This has been improved with "covering indexes" in Postgres 9.2, but certain requirements have to be met to profit from that. More in the Postgres Wiki about Index-only scans.

For queries with conditions on city and state, this multicolumn index would help a lot, if the conditions are selective (only a small percentage of the rows meet the condition):

CREATE INDEX adr_foo_idx ON adr (city, state);

If you have a small set of typical conditions, you might even use partial indexes:

CREATE INDEX adr_ny_ny_idx ON adr(adr_id)
WHERE  city = 'New York'
AND    state = 'NY';

... one for every set of (state, city)

Or a combination of both:

CREATE INDEX adr_ny_idx ON adr (city)
WHERE  state = 'NY';

... one per state

Normalize

Of course, everything to make your big table (and indexes) smaller helps. Lookup tables for cities and cities would go a long way to cut down on redundant storage. The key word here is normalization.

Instead of:

CREATE TABLE adr (
  adr_id serial PRIMARY KEY
 ,state text
 ,city text
 ...
 );

SELECT count(*)
FROM   adr
WHERE  city = 'New York'
AND    state = 'NY';

Normalize your database design and use proper indexes:

CREATE TABLE state (
  state_id serial PRIMARY KEY
 ,state text UNIQUE
 );

CREATE TABLE city (
  city_id serial PRIMARY KEY
 ,state_id int REFERENCES state
 ,city text
 ,UNIQUE (state_id, city)
 );

CREATE TABLE adr (
  adr_id serial PRIMARY KEY
  city_id int REFERENCES city
  ...
 );

CREATE INDEX adr_city_idx ON adr (city_id);

SELECT count(*)
FROM   state s
JOIN   city  c USING (state_id)
JOIN   adr   a USING (city_id)
WHERE  s.state = 'NY'
AND    c.city  = 'New York'

Table and index become smaller. Integer handling is faster than text. Everything becomes faster.

Materialized view

On top of that, if performance is crucial, and since you do not need exact counts, you could use a materialized view with counts for relevant conditions. Refresh the view at events or times of your choosing to keep numbers up to date. Follow the link to the manual for details. Requires Postgres 9.3, but you can easily implement it manually in any version.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • That seems to return just how many rows are in the table. What about if I wanted to limit it by the query conditions, like `WHERE city='New York' AND state='NY'`? – Avishai Feb 17 '14 at 21:36
  • @Avishai: Are your conditions stable or arbitrary? – Erwin Brandstetter Feb 17 '14 at 21:53
  • Arbitrary-ish. Most searches will filter by [City, State] at least, plus arbitrary values for a couple of other columns. – Avishai Feb 17 '14 at 22:00
  • @Avishai: I added some more pointers. It really depends on your exact environment, requirements - and on your version of Postgres. Neither of them are in your question. – Erwin Brandstetter Feb 17 '14 at 22:24