Joining 2 large postgres tables using int8range not scaling well

Question

I'd like to join IP routing table information to IP whois information. I'm using Amazon's RDS which means I can't use the Postgres ip4r extension, and so I am instead using int8range types to represent the IP address ranges, with gist indexes.

My tables look like this:

=> \d routing_details
     Table "public.routing_details"
  Column  |   Type    | Modifiers
----------+-----------+-----------
 asn      | text      |
 netblock | text      |
 range    | int8range |
Indexes:
    "idx_routing_details_netblock" btree (netblock)
    "idx_routing_details_range" gist (range)


=> \d netblock_details
    Table "public.netblock_details"
   Column   |   Type    | Modifiers
------------+-----------+-----------
 range      | int8range |
 name       | text      |
 country    | text      |
 source     | text      |
Indexes:
    "idx_netblock_details_range" gist (range)

The full routing_details table contains just under 600K rows, and netblock_details contains around 8.25M rows. There are overlapping ranges in both tables, but for each range in the routing_details table I want to get the single best (smallest) match from the netblock_details table.

I came up with 2 different queries that I think will return the accurate data, one using window functions and one using DISTINCT ON:

EXPLAIN SELECT DISTINCT ON (r.netblock) *
FROM routing_details r JOIN netblock_details n ON r.range <@ n.range
ORDER BY r.netblock, upper(n.range) - lower(n.range);
                                              QUERY PLAN
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=118452809778.47..118477166326.22 rows=581300 width=91)
   Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, r.netblock, ((upper(n.range) - lower(n.range)))
   ->  Sort  (cost=118452809778.47..118464988052.34 rows=4871309551 width=91)
         Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, r.netblock, ((upper(n.range) - lower(n.range)))
         Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
         ->  Nested Loop  (cost=0.00..115920727265.53 rows=4871309551 width=91)
               Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, r.netblock, (upper(n.range) - lower(n.range))
               Join Filter: (r.range <@ n.range)
               ->  Seq Scan on public.routing_details r  (cost=0.00..11458.96 rows=592496 width=43)
                     Output: r.asn, r.netblock, r.range
               ->  Materialize  (cost=0.00..277082.12 rows=8221675 width=48)
                     Output: n.range, n.name, n.country
                     ->  Seq Scan on public.netblock_details n  (cost=0.00..163712.75 rows=8221675 width=48)
                           Output: n.range, n.name, n.country
(14 rows)               ->  Seq Scan on netblock_details n  (cost=0.00..163712.75 rows=8221675 width=48)


EXPLAIN VERBOSE SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY r.range ORDER BY UPPER(n.range) - LOWER(n.range)) AS rank
FROM routing_details r JOIN netblock_details n ON r.range <@ n.range
) a WHERE rank = 1 ORDER BY netblock;

                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=118620775630.16..118620836521.53 rows=24356548 width=99)
   Output: a.asn, a.netblock, a.range, a.range_1, a.name, a.country, a.rank
   Sort Key: a.netblock
   ->  Subquery Scan on a  (cost=118416274956.83..118611127338.87 rows=24356548 width=99)
         Output: a.asn, a.netblock, a.range, a.range_1, a.name, a.country, a.rank
         Filter: (a.rank = 1)
         ->  WindowAgg  (cost=118416274956.83..118550235969.49 rows=4871309551 width=91)
               Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, row_number() OVER (?), ((upper(n.range) - lower(n.range))), r.range
               ->  Sort  (cost=118416274956.83..118428453230.71 rows=4871309551 width=91)
                     Output: ((upper(n.range) - lower(n.range))), r.range, r.asn, r.netblock, n.range, n.name, n.country
                     Sort Key: r.range, ((upper(n.range) - lower(n.range)))
                     ->  Nested Loop  (cost=0.00..115884192443.90 rows=4871309551 width=91)
                           Output: (upper(n.range) - lower(n.range)), r.range, r.asn, r.netblock, n.range, n.name, n.country
                           Join Filter: (r.range <@ n.range)
                           ->  Seq Scan on public.routing_details r  (cost=0.00..11458.96 rows=592496 width=43)
                                 Output: r.asn, r.netblock, r.range
                           ->  Materialize  (cost=0.00..277082.12 rows=8221675 width=48)
                                 Output: n.range, n.name, n.country
                                 ->  Seq Scan on public.netblock_details n  (cost=0.00..163712.75 rows=8221675 width=48)
                                       Output: n.range, n.name, n.country
(20 rows)

The DISTINCT ON seems slightly more efficient, so I've proceeded with that one. When I run the query against the full dataset I get an out of disk space error after around a 24h wait. I've created a routing_details_small table with a subset of N rows of the full routing_details table to try and understand what's going on.

With N=1000

=> EXPLAIN ANALYZE SELECT DISTINCT ON (r.netblock) *
-> FROM routing_details_small r JOIN netblock_details n ON r.range <@ n.range
-> ORDER BY r.netblock, upper(n.range) - lower(n.range);
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=4411888.68..4453012.20 rows=999 width=90) (actual time=124.094..133.720 rows=999 loops=1)
   ->  Sort  (cost=4411888.68..4432450.44 rows=8224705 width=90) (actual time=124.091..128.560 rows=4172 loops=1)
         Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
         Sort Method: external sort  Disk: 608kB
         ->  Nested Loop  (cost=0.41..1780498.29 rows=8224705 width=90) (actual time=0.080..101.518 rows=4172 loops=1)
               ->  Seq Scan on routing_details_small r  (cost=0.00..20.00 rows=1000 width=42) (actual time=0.007..1.037 rows=1000 loops=1)
               ->  Index Scan using idx_netblock_details_range on netblock_details n  (cost=0.41..1307.55 rows=41124 width=48) (actual time=0.063..0.089 rows=4 loops=1000)
                     Index Cond: (r.range <@ range)
 Total runtime: 134.999 ms
(9 rows)

With N=100000

=> EXPLAIN ANALYZE SELECT DISTINCT ON (r.netblock) *
-> FROM routing_details_small r JOIN netblock_details n ON r.range <@ n.range
-> ORDER BY r.netblock, upper(n.range) - lower(n.range);
                                                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=654922588.98..659034941.48 rows=200 width=144) (actual time=28252.677..29487.380 rows=98992 loops=1)
   ->  Sort  (cost=654922588.98..656978765.23 rows=822470500 width=144) (actual time=28252.673..28926.703 rows=454856 loops=1)
         Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
         Sort Method: external merge  Disk: 64488kB
         ->  Nested Loop  (cost=0.41..119890431.75 rows=822470500 width=144) (actual time=0.079..24951.038 rows=454856 loops=1)
               ->  Seq Scan on routing_details_small r  (cost=0.00..1935.00 rows=100000 width=96) (actual time=0.007..110.457 rows=100000 loops=1)
               ->  Index Scan using idx_netblock_details_range on netblock_details n  (cost=0.41..725.96 rows=41124 width=48) (actual time=0.067..0.235 rows=5 loops=100000)
                     Index Cond: (r.range <@ range)
 Total runtime: 29596.667 ms
(9 rows)

With N=250000

=> EXPLAIN ANALYZE SELECT DISTINCT ON (r.netblock) *
-> FROM routing_details_small r JOIN netblock_details n ON r.range <@ n.range
-> ORDER BY r.netblock, upper(n.range) - lower(n.range);
                                                                                      QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1651822953.55..1662103834.80 rows=200 width=144) (actual time=185835.443..190143.266 rows=247655 loops=1)
   ->  Sort  (cost=1651822953.55..1656963394.18 rows=2056176250 width=144) (actual time=185835.439..188779.279 rows=1103850 loops=1)
         Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
         Sort Method: external merge  Disk: 155288kB
         ->  Nested Loop  (cost=0.28..300651962.46 rows=2056176250 width=144) (actual time=19.325..177403.913 rows=1103850 loops=1)
               ->  Seq Scan on netblock_details n  (cost=0.00..163743.05 rows=8224705 width=48) (actual time=0.007..8160.346 rows=8224705 loops=1)
               ->  Index Scan using idx_routing_details_small_range on routing_details_small r  (cost=0.28..22.16 rows=1250 width=96) (actual time=0.018..0.018 rows=0 loops=8224705)
                     Index Cond: (range <@ n.range)
 Total runtime: 190413.912 ms
(9 rows)

Against the full table with 600k rows the query fails after around 24h with an error about running out of disk space, which is presumably caused by the external merge step. So this query is working well and very quickly with a small routing_details table, but is scaling very poorly.

Suggestions for how to improve my query, or perhaps even schema changes I could make so that this query will work efficiently on the full dataset?

Did you run `ANALYZE routing_details; ANALYZE netblock_details` After populating these tables? Also: what happens to the queryplan when you (temporally) omit the `ORDER BY` ? — joop, Jul 23 '15 at 10:31
"external sort Disk: 608kB", what's your setting for work_mem? You need more memory to avoid disk sorts. SET work_mem TO '....'; — Frank Heikens, Jul 23 '15 at 10:55
@joop I didn't, but just tried again and same results. When I omit the order by here's the query plan https://gist.github.com/coderholic/92ae27c405cd25e45fa6 - it's quick, but results incorrect — Ben Dowling, Jul 23 '15 at 15:42
@FrankHeikens was using RDS default of 1MB. Changed to 100MB which makes the first few example queries faster but still not enough memory for the full dataset — Ben Dowling, Jul 23 '15 at 15:42
My guess is that `, upper(n.range) - lower(n.range);` is treated as a set of functions, making it non-indexable to the outer query. Trying to compute it in the inner query might help/avoid the sort. (avoiding the `SELECT *` in the outer query could help, too) — joop, Jul 23 '15 at 15:56
@joop moved it to an inner query and got the same query plan https://gist.github.com/coderholic/72d5784a038b0d948c3d - also added a size column to the table so it doesn't need to be computed and got the same result — Ben Dowling, Jul 23 '15 at 16:10
You could move the sorting on size to the inner loop, since you are only intersted in the largest match. `row_number OVER (PARTITION BY r.netblock ORDER BY (upper(n.range) - lower(n.range)) DESC) AS rn` , and filter `WHERE rn=1` in the outer query. — joop, Jul 23 '15 at 16:24
@joop yeah that's exactly what I'm doing in the other example query I give at the start, but it performs worse than DISTINCT ON. — Ben Dowling, Jul 25 '15 at 09:08
Could there be primary key constraints for the two "tables", possibly involving the ranges? I suspect that the ranges could be unique (but possibly overlapping) for both tables. — wildplasser, Jul 26 '15 at 19:47
@BenDowling BTW, from a relational perspective, tables without a primary key are essentially meaningless. — wildplasser, Jul 26 '15 at 20:54
@wildplasser yes the ranges are really the primary keys, but not sure that you can use gist indexes as primary key. — Ben Dowling, Jul 27 '15 at 07:31
If the ranges are non-overlapping you could use an EXCLUDE constraint to enforce a *kind* of uniqueness http://www.postgresql.org/docs/9.3/static/rangetypes.html (8.17.10) — joop, Jul 27 '15 at 09:16
Also, did you consider `GROUP BY`? At times it can be faster than `DISTINCT`. — Parfait, Jul 29 '15 at 03:08
@GordonLinoff they're recommended for range queries in the official docs: http://www.postgresql.org/docs/9.2/static/rangetypes.html#RANGETYPES-GIST — Ben Dowling, Jul 29 '15 at 09:10

score 3 · Answer 1 · answered Jul 23 '15 at 15:45

I don't have really good answer for you, because I'm not familiar with gist indexes, but I'm kind of interested so I took a little look at your explain plan. A couple things stood out:

1) Your plan is using a nested loop join, even in the 250K example. It is seq scanning the larger table, and doing lookups on the smaller one. This means it's doing 8 million index lookups on the smaller table, taking up over 148 seconds. It's strange to me that this slows significantly with an increase in the size of the routing_details_small table. Like I said, I'm unfamiliar with gist indexes, but I would experiment with set enable_nestloop to false; to see if you can get it to do some kind of sorted merge/hash join.

2) The distinct is being executed at the end. It takes a fairly small portion of the time (~11 seconds), but that also means you may be doing slightly extra work. It looks like the distinct brings the resulting number of records down from over 1 million to 250K, so I would experiment with trying it earlier. I'm not sure if you are getting duplicates because there are multiple entries in the routing_details_small table for a netblock, or that the netblock_details table has multiple matches for a given netblock. If the former, you could join on a subquery with only unique routing details. If the latter, try the thing I'm about to mention:

3) Somewhat combining the previous two observations, you might try doing a partial join (joining on a subquery) from a seq scan on routing_details_small. This should only result in 600K index scans. Something like (assuming postgres 9.4): SELECT * FROM routing_details_small r, LATERAL (SELECT * FROM netblock_details n WHERE r.range <@ n.range LIMIT 1) nb;

There are multiple entries in netblock_details for each entry in routing_details, anything from 3 to around 10. You proposed query in (3) seems to work once I add a small modification to ensure that I get the narrowest match by adding an order clause. Will run some additional tests to confirm. — Ben Dowling, Jul 23 '15 at 16:21
Also, I'm using postgres 9.3, yet the query still seems to work. Anything I should be aware of if running this against 9.3? — Ben Dowling, Jul 23 '15 at 16:23
No - I just thought that it was 9.4 that added LATERAL, but I might be mistaken. Glad to hear it helped! — Ramfjord, Jul 23 '15 at 16:27
The query took around 14h to complete on the full dataset. Any further optimizations that are available? https://gist.github.com/coderholic/9e90311f9323b543aef2 — Ben Dowling, Jul 24 '15 at 07:16

Vladimir Baranov · Accepted Answer · 2015-07-30T13:08:42.013

I was thinking originally of a lateral join as in other suggested approaches (for example, the last query by Erwin Brandstetter, where he uses simple int8 datatype and simple indexes), but didn't want to write it in the answer, because I thought that it is not really efficient. When you try to find all netblock ranges that cover the given range, an index doesn't help much.

I'll repeat the Erwin Brandstetter's query here:

SELECT *  -- only select columns you need to make it faster
FROM   routing_details r
     , LATERAL (
   SELECT *
   FROM   netblock_details n
   WHERE  n.ip_min <= r.ip_min
   AND    n.ip_max >= r.ip_max
   ORDER  BY n.ip_max - n.ip_min
   LIMIT  1
   ) n;

When you have an index on netblock_details, like this:

CREATE INDEX netblock_details_ip_min_max_idx ON netblock_details 
(ip_min, ip_max DESC NULLS LAST);

you can quickly (in O(logN)) find the starting point of the scan in the netblock_details table - either the maximum n.ip_min that is less than r.ip_min, or the minimum n.ip_max that is more than r.ip_max. But then you have to scan/read the rest of the index/table and for each row do the second part of the check and filter out most rows.

In other words, this index helps to quickly find the starting row that satisfies first search criteria: n.ip_min <= r.ip_min, but then you'll continue reading all rows that satisfy this criteria and for each such row perform the second check n.ip_max >= r.ip_max. On average (if data has even distribution) you'll have to read half of the rows of the netblock_details table. Optimizer may choose to use index to search n.ip_max >= r.ip_max first and then apply second filter n.ip_min <= r.ip_min, but you can't use this index to apply both filters together.

End result: for each row from routing_details we'll read through half of rows from netblock_details. 600K * 4M = 2,400,000,000,000 rows. It is 2 times better than Cartesian product. You can see this number (Cartesian product) in the output of EXPLAIN ANALYZE in the question.

592,496 * 8,221,675 = 4,871,309,550,800

No wonder your queries run out of disk space when trying to materialize and sort this.

The overall high level process to get to the final result:

join two tables (without finding the best match yet). In the worst case it is Cartesian product, in the best case it is all covering ranges from netblock_details for each range from routing_details. You said that there are multiple entries in netblock_details for each entry in routing_details, anything from 3 to around 10. So, result of this join could have ~6M rows (not too much)
order/partition the result of the join by the routing_details ranges and for each such range find the best (smallest) covering range from netblock_details.

My idea is to reverse the query. Instead of finding all covering ranges from larger netblock_details for each row from smaller routing_details table I suggest to find all smaller ranges from smaller routing_details for each row from larger netblock_details.

Two step process

For each row from larger netblock_details find all ranges from routing_details that are inside the netblock range.

I would use the following schema in the queries. I've added primary key ID to both tables.

CREATE TABLE routing_details (
ID        int
,ip_min   int8
,ip_max   int8
,asn      text
,netblock text
);

CREATE TABLE netblock_details (
ID        int
,ip_min   int8
,ip_max   int8
,name     text
,country  text
,source   text
);

SELECT
    netblock_details.ID AS n_ID
    ,netblock_details.ip_max - netblock_details.ip_min AS n_length
    ,r.ID AS r_ID
FROM
    netblock_details
    INNER JOIN LATERAL
    (
        SELECT routing_details.ID
        FROM routing_details
        WHERE
            routing_details.ip_min >= netblock_details.ip_min
            AND routing_details.ip_min <= netblock_details.ip_max
            -- note how routing_details.ip_min is limited from both sides
            -- this would make it possible to scan only (hopefully) small
            -- portion of the table instead of full or half table
            AND routing_details.ip_max <= netblock_details.ip_max
            -- this clause ensures that the whole routing range
            -- is inside the netblock range
    ) AS r ON true

We need index on routing_details on (ip_min, ip_max). The main thing here is index on ip_min. Having second column in the index helps by eliminating the need to do the lookup for the value of ip_max; it doesn't help in the tree search.

Note that the lateral subquery doesn't have LIMIT. It is not the final result yet. This query should return ~6M rows. Save this result in a temporary table. Add an index to the temporary table on (r_ID, n_length, n_ID). n_ID is again just to remove extra lookups. We need an index do avoid sorting the data for each r_ID.

Final step

For each row from routing_details find the n_ID with the smallest n_length. Now we can use the lateral join in "proper" order. Here temp table is result of the previous step with the index.

SELECT
    routing_details.*
    ,t.n_ID
    ,netblock_details.*
FROM
    routing_details
    INNER JOIN LATERAL
    (
        SELECT temp.n_ID
        FROM temp
        WHERE temp.r_ID = routing_details.ID
        ORDER BY temp.n_length
        LIMIT 1
    ) AS t ON true
    INNER JOIN netblock_details ON netblock_details.ID = t.n_ID

Here subquery should be a seek of the index, not scan. Optimizer should be smart enough to do just the seek and return the first found result because of LIMIT 1, so you'll have 600K seeks of index in 6M row temp table.

Original answer (I'll keep it just for the diagram of ranges):

Can you "cheat"?

If I understood your query correctly, for each routing_details.range you want to find a smallest netblock_details.range that covers/is larger than routing_details.range.

Given the following example, where r is routing range and n1, ..., n8 are netblock ranges, the correct answer is n5.

   |---|
   n1

     |------------------|
     n2

                           |---------------|
                           n3

                                          |-----|
                                          n4

                  |------------------|
                  n5

                     |--------------------------------------|
                     n6

        |---------------------------|
        n7

                      |-----|
                      n8

                      |------------|
                      r
                     start       end

n.start <= r.start AND n.end >= r.end
order by n.length
limit 1

Your query that took 14 hours shows that index scan took 6ms, but sorting by range length took 80ms.

With this kind of search there is no simple 1D ordering of the data. You are using n.start together with n.end and together with n.length. But, maybe your data is not that generic, or it is OK to return a somewhat different result.

For example, if it was OK to return n6 as a result, it could work much faster. n6 is the covering range that has largest start:

n.start <= r.start AND n.end >= r.end
order by n.start desc
limit 1

Or, you could go for n7, which has smallest end:

n.start <= r.start AND n.end >= r.end
order by n.end
limit 1

This kind of search would use simple index on n.start (or n.end) without extra sorting.

A second, completely different approach. How big/long are the ranges? If they are relatively short (few numbers), then you could try to store them as an explicit list of integers, instead of a range. For example, range [5-8] would be stored as 4 rows: (5, 6, 7, 8). With this storage model it may be easier to find intersections of ranges.

No it has to be the smallest range. See https://en.wikipedia.org/wiki/Longest_prefix_match for why. The ranges can also be very large. — Ben Dowling, Jul 28 '15 at 10:45
Is the data distribution skewed significantly in any way? For example, `netblock` ranges are evenly distributed between, say, 1 and 100,000,000, but 99% of `routing` ranges are within 100 and 1000. Or, distribution of lengths is very skewed. — Vladimir Baranov, Jul 28 '15 at 11:53
This is an interesting read, worth an upvote in any case. The proof of the pudding is in the eating. Can you make it work? And how does it perform? I see a couple of obstacles. You could take my test setup from the fiddle for a quick start. Either way, thinking further about this, it came back to me: we have solved this problem before: http://dba.stackexchange.com/a/22500/3684 — Erwin Brandstetter, Jul 30 '15 at 14:12
@ErwinBrandstetter, sorry, I can't fully understand the question and your answer on dba at the moment, but the problem seems to be not quite the same as here. Anyway, the performance of my suggested solution depends on the data distribution. If you have millions of very large `netblock` ranges each of which cover (almost) all `routing` ranges, then the first step would read through the (almost) whole `routing` table millions of times. If most `netblock` ranges are narrow, then the inner loop will be tight. So, it would be interesting to see if the OP could try the first step on real data. — Vladimir Baranov, Jul 30 '15 at 23:32
Whoa, the 2 queries provided in the updated answer give the correct result in under 90 seconds, from a previous best-answer of 14 hours! Amazing! — Ben Dowling, Aug 02 '15 at 03:35
@BenDowling, I'm glad it helped. I tried to write down the reasoning for my approach and hope it is clear enough. Bottom line: **correct** indexes are powerful stuff, but index itself is not magic and it helps to know how it works, what it can do, what it can't. — Vladimir Baranov, Aug 02 '15 at 23:21

score 2 · Answer 3 · edited May 23 '17 at 10:30

Use a LATERAL join (find smallest match per row in routing_details):

What is the difference between LATERAL and a subquery in PostgreSQL?

Current DB design

The only relevant index for these queries is:

"idx_netblock_details_range" gist (range)

The other indexes are irrelevant here.

Query

SELECT *  -- only select columns you need to make it faster
FROM   routing_details r
     , LATERAL (
   SELECT *
   FROM   netblock_details n
   WHERE  n.range @> r.range
   ORDER  BY upper(n.range) - lower(n.range)
   LIMIT  1
   ) n

SQL Fiddle with more realistic test data.

Like in your original queries, rows from routing_details without any match in netblock_details are dropped from the result.

Performance depends on data distribution. This should be superior with many matches. DISTINCT ON might win with only very few matches per row in routing_details - but it needs a lot of work_mem for the big sort. Make it around 200 MB for the big query. Use SET LOCAL in the same transaction:

Configuration parameter work_mem in PostgreSQL on Linux

This query will not need as much sort memory. Unlike with DISTINCT ON you should not see Postgres swapping to disk for the sort with a halfway decent setting for work_mem. So no line like this in the EXPLAIN ANALYZE output:

~~Sort Method: external merge Disk: 155288kB~~

Simpler DB design

On a second look, I tested a simplified design with plain int8 columns for lower and upper bound instead of range types and a simple btree index:

CREATE TABLE routing_details (    -- SMALL table
   ip_min   int8
 , ip_max   int8
 , asn      text
 , netblock text
 );

CREATE  TABLE netblock_details (  -- BIG table
   ip_min   int8
 , ip_max   int8
 , name     text
 , country  text
 , source   text
 );

CREATE INDEX netblock_details_ip_min_max_idx ON netblock_details
(ip_min, ip_max DESC NULLS LAST);

Sorting the second index column DESC NULLS LAST is essential!

Query

SELECT *  -- only select columns you need to make it faster
FROM   routing_details r
     , LATERAL (
   SELECT *
   FROM   netblock_details n
   WHERE  n.ip_min <= r.ip_min
   AND    n.ip_max >= r.ip_max
   ORDER  BY n.ip_max - n.ip_min
   LIMIT  1
   ) n;

Same basic technique. In my tests this was ~ 3 times faster than the first approach. But still not fast enough for millions of rows.

SQL Fiddle.

Detailed explanation for the technique (examples with b-tree indexes, but the query principle is similar for the GiST index):

Optimize GROUP BY query to retrieve latest record per user

And for the DISTINCT ON variant:

Select first row in each GROUP BY group?

Advanced solution

Above solutions scale linearly with the number rows in routing_details, but deteriorate with the number of matches in netblock_details. It finally came back to me: we have solved this before on dba.SE with a more complex approach yielding largely superior performance:

Can spatial index help a “range - order by - limit” query

frequency in the linked answer plays the role of ip_max - n.ip_min / upper(range) - lower(range) here.

I would be surprised if adding the range length to the index would eliminate the sort step. It would be interesting to see the output of `EXPLAIN ANALYZE` with this multicolumn index. I think it would be the same as when index had just `range`: https://gist.github.com/coderholic/9e90311f9323b543aef2 Would be glad to be proved wrong. — Vladimir Baranov, Jul 29 '15 at 23:33
@VladimirBaranov: I tested with both indexes on joop's setup, the multicolumn index was faster. However, I cannot reproduce the result with a more realistic test setup. Maybe I had used pg 9.5 by mistake (which introduces index-only scans for GiST indexes). Consequently I am dropping the suggestion for a multicolumn GiST index. (I never said it would "eliminate" the sort step, btw.) The `LATERAL` query is fastest in all tests either way, by a long shot. Adding a simpler alternative without ranges and GiST ... and fiddles where you can see the EXPLAIN output. — Erwin Brandstetter, Jul 30 '15 at 02:56
I would appreciate if you could have a look at my revised answer, where I suggested a two-step approach and write what you think about it. The main point - I don't think a single index can help much, but having two steps with two different indexes for different purposes should do the trick. — Vladimir Baranov, Jul 30 '15 at 13:10

joop · Answer 4 · 2015-07-27T10:46:21.887

I don't know if this performs on real data. The candidate selection is squeezed into the inner loop, which seems good to me. On testing it gave two index scans (plus one for the anti join), avoiding the final sort/unique. It does seem to give equivalent results.

-- EXPLAIN ANALYZE
SELECT *
FROM routing_details r
JOIN netblock_details n ON r.range <@ n.range
        -- We want the smallest overlapping range
        -- Use "Not exists" to suppress overlapping ranges 
        -- that are larger than n
        -- (this should cause an antijoin)
WHERE NOT EXISTS(
        SELECT * FROM netblock_details nx
        WHERE  r.range <@ nx.range      -- should enclose r
        AND n.range <> nx.range         -- but differ from n
        AND (nx.range <@ n.range        -- and overlap n, or be larger
                OR upper(nx.range) - lower(nx.range) < upper(n.range) - lower(n.range)
                OR (upper(nx.range) - lower(nx.range) = upper(n.range) - lower(n.range) AND lower(nx.range) > lower(n.range) )
                )
        )
ORDER BY r.netblock
        -- not needed any more
        -- , upper(n.range) - lower(n.range)
        ;

UPDATE: (FWIW) as a bonus, my test dataset

CREATE Table routing_details
 ( asn          text
 , netblock     text
 , range        int8range
 );
-- Indexes:
CREATE INDEX idx_routing_details_netblock ON routing_details (netblock);
CREATE INDEX idx_routing_details_range ON routing_details USING gist(range) ;


CREATE    Table netblock_details
 ( range        int8range
 , name         text
 , country      text
 , source       text
 );
-- Indexes:
CREATE INDEX idx_netblock_details_range ON netblock_details USING gist(range);
        -- the smaller table
INSERT INTO routing_details(range,netblock)
SELECT int8range(gs, gs+13), 'block_' || gs::text
FROM generate_series(0,1000000, 11) gs
        ;

        -- the larger table
INSERT INTO netblock_details(range,name)
SELECT int8range(gs, gs+17), 'name_' || gs::text
FROM generate_series(0,1000000, 17) gs
        ;

INSERT INTO netblock_details(range,name)
SELECT int8range(gs, gs+19), 'name_' || gs::text
FROM generate_series(0,1000000, 19) gs
        ;

INSERT INTO netblock_details(range,name)
SELECT int8range(gs, gs+23), 'name_' || gs::text
FROM generate_series(0,1000000, 23) gs
        ;

VACUUM ANALYZE routing_details;
VACUUM ANALYZE netblock_details;

Doesn't appear to give the correct results. In my routing_details_small table I have 250k rows, but the output of this query has over 260k rows. There should only be one row in the output for every row in the routing_details table. — Ben Dowling, Jul 27 '15 at 20:20
Maybe you have duplicate ranges in your routing_details (small) table ? — joop, Jul 28 '15 at 08:43
Upvote for providing the test case alone! Interesting query, too. `DISTINCT ON` or `LATERAL` where faster in my local tests, though. — Erwin Brandstetter, Jul 29 '15 at 03:43

Joining 2 large postgres tables using int8range not scaling well

4 Answers4

Current DB design

Query

Simpler DB design

Query

Advanced solution

Linked