23

I'm looking for something that, given a table like:

| id | number |
|  1 |     .7 |
|  2 |   1.25 |
|  3 |   1.01 |
|  4 |    3.0 |

the query SELECT * FROM my_table WHEREnumberCLOSEST(1) would return row 3. I only care about numbers. Right now I've got a procedure that just loops over every row and does a comparison, but I figure the information should be available from a b-tree index, so this might be possible as a builtin, but I can't find any documentation suggesting that it does.

quodlibetor
  • 8,185
  • 4
  • 35
  • 48

4 Answers4

28

I may be a little off on the syntax, but this parameterized query (all the ? take the '1' of the original question) should run fast, basically 2 B-Tree lookups [assuming number is indexed].

SELECT * FROM
(
  (SELECT id, number FROM t WHERE number >= ? ORDER BY number LIMIT 1) AS above
  UNION ALL
  (SELECT id, number FROM t WHERE number < ? ORDER BY number DESC LIMIT 1) as below
) 
ORDER BY abs(?-number) LIMIT 1;

The query plan for this with a table of ~5e5 rows (with an index on number) looks like this:

psql => explain select * from (
        (SELECT id, number FROM t WHERE number >= 1 order by number limit 1) 
        union all
        (select id, number from t where number < 1 order by number desc limit 1)
) as make_postgresql_happy 
order by abs (1 - number) 
limit 1;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.24..0.24 rows=1 width=12)
   ->  Sort  (cost=0.24..0.24 rows=2 width=12)
         Sort Key: (abs((1::double precision - public.t.number)))
         ->  Result  (cost=0.00..0.23 rows=2 width=12)
               ->  Append  (cost=0.00..0.22 rows=2 width=12)
                     ->  Limit  (cost=0.00..0.06 rows=1 width=12)
                           ->  Index Scan using idx_t on t  (cost=0.00..15046.74 rows=255683 width=12)
                                 Index Cond: (number >= 1::double precision)
                     ->  Limit  (cost=0.00..0.14 rows=1 width=12)
                           ->  Index Scan Backward using idx_t on t  (cost=0.00..9053.67 rows=66136 width=12)
                                 Index Cond: (number < 1::double precision)
(11 rows)
mu is too short
  • 426,620
  • 70
  • 833
  • 800
Andrew Lazarus
  • 18,205
  • 3
  • 35
  • 53
  • That's doing two "Seq Scan on t" rather than just one (at least for me). – mu is too short May 23 '11 at 21:47
  • @mu is too short: Did you `ANALYZE` after adding an index to number? PG is supposed to be smart enough to do an index scan for `ORDER BY/LIMIT`, but I guess you never know. – Andrew Lazarus May 23 '11 at 21:49
  • I just did an ANALYZE to be sure and I'm still getting two scans. OTOH, I don't have a million rows in my test table so the optimizer could be using a scan because scanning a small table is cheap. – mu is too short May 23 '11 at 21:52
  • I think it was the small table size that was fooling me. I stuffed 1e5 rows in and it is doing index scans now. Nice work. I can add the explain output to your answer if you want to have a look at it (saves you from building your own pile of test data). – mu is too short May 23 '11 at 22:00
  • @mu, would you? We use MySQL here at the office :( ; Postgresql is on the home laptop. – Andrew Lazarus May 23 '11 at 22:03
  • I added the explain output for you. And, I can eyeball the speed boost of yours over mine with half a million rows so you deserve more than three upvotes. – mu is too short May 23 '11 at 22:12
  • Not sure I follow @Tocchetto's comment. If there are only 2 numbers greater than X, the WHERE clause will limit the return from that subquery to 2. – Andrew Lazarus Nov 14 '18 at 19:45
  • @AndrewLazarus After my comment I fiddled with a little more, and notice that the whole secret for this query is the last line abs (1 - number) , that gives an absolute difference between the numbers, so the ones with the smalles difference are the closest ones. I hadn't noticed it the first time I reads the post, my bad – Tocchetto Nov 15 '18 at 16:32
7

You could try something like this:

select *
from my_table
where abs(1 - number) = (select min(abs(1 - number)) from t)

This isn't that much different than manually looping through the table but at least it lets the database do the looping inside "database space" rather than having to jump back and forth between your function and the database internals. Also, pushing it all into a single query lets the query engine know what you're trying to do and then it can try to do it in a sensible way.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
  • This is a correct query, similar to the one I'm currently using, but I don't think it would be helped by indexes. My problem is that the table I'm querying has hundreds of millions of rows, and takes a long, long time. – quodlibetor May 23 '11 at 21:22
  • 3
    @quodlibetor: Did you try to create an index on `abs(1-number)` to speed up that query? –  May 23 '11 at 21:24
  • E.g. Running an EXPLAIN on this query returns that postgres will have to do a sequential scan on 281,610,907,722 rows... which is "irritating".) – quodlibetor May 23 '11 at 21:29
  • @a_horse_with_no_name: Thanks! That will help my common case, but in general the searched-for number could be any arbitrary thing. – quodlibetor May 23 '11 at 21:30
  • 1
    The index helps if the argument of closest is always 1. If it's arbitrary, the problem is harder. – Andrew Lazarus May 23 '11 at 21:32
  • @quodlibetor: I think you're stuck using an aggregate somewhere and that will lead to a table scan and that scan is slow. A variant is `select id, number from t order by abs(1 - number) limit 1` but that also leads to a scan. – mu is too short May 23 '11 at 21:33
  • @Andrew Lazarus: any tips for where I should look to solve the harder problem? – quodlibetor May 23 '11 at 21:36
  • @Quodlibetor, I have posted an answer for you – Andrew Lazarus May 23 '11 at 21:43
6

The 2nd answer is correct, but I encountered error on "UNION ALL":

DBD::Pg::st execute failed: ERROR: syntax error at or near "UNION"

I fixed it with this code:

SELECT * FROM
  (
    (SELECT * FROM table WHERE num >= ? ORDER BY num LIMIT 1)
        UNION ALL
    (SELECT * FROM table WHERE num < ?  ORDER BY num DESC LIMIT 1)
  ) as foo
ORDER BY abs(?-num) LIMIT 1;

the trick is to remove the AS from the inner tables and use it only on the UNION.

luca76
  • 813
  • 1
  • 10
  • 20
0

This code is helpful if you wish to find the closest value within groups. Here,I split my table tb by column_you_wish_to_group_by based on how close my column val is close to my target value 0.5.

SELECT *
FROM (
  SELECT
    ROW_NUMBER() OVER (PARTITION BY t.column_you_wish_to_group_by ORDER BY abs(t.val - 0.5) ASC) AS r,
    t.*
  FROM
    tb t) x 
WHERE x.r = 1;
Mtap1
  • 167
  • 2
  • 4