5

I am new to PostgreSQL and my question is similar to the one here: link

For e.g I have the following table:

| id |       vector         |
|  1 |  { 1.43, 3.54, 5.55} |
|  2 |  { 4.46, 5.56, 4.55} |
|  3 |  { 7.48, 2.54, 4.55} |
|  4 |  { 2.44, 2.34, 4.55} |

A sample query which goes something like

SELECT * FROM my_table WHERE vector CLOSEST('{1.23, 4.43, 4.63}') 

Should return resulting rows in a sorted fashion, where I determine the "closest" vector using a custom Distance-function e.g. calc_l2norm( double precision[], double precision[] ) which returns the Euclidean Distance.

Community
  • 1
  • 1
Lyman Zerga
  • 1,415
  • 3
  • 19
  • 40

2 Answers2

6

Generally speaking you can solve this type of problems by using a stored function, written in Java or Scala (some might prefer PL/SQL, C or C++).

PostgreSql supports (Java based) stored functions, so let the SQL query fetch the data, and pass it to a stored function. The stored function returns the distance, so you can filter/sort etc. on it.

Based on a table like this

create table point(vector float8[]);
insert into point values('{0.0, 0.0, 0.0}');
insert into point values('{0.5, 0.5, 0.5}');

with a Java function like this:

public class PlJava {
    public final static double distance2(double[] v1, double[] v2) {
        return Math.sqrt(Math.pow(v2[0] - v1[0], 2)
          + Math.pow(v2[1] - v1[1], 2) + Math.pow(v2[2] - v1[2], 2));
    }
}

and the function declaration in SQL:

CREATE FUNCTION pljava.distance2(float8[], float8[])
  RETURNS float8
  AS 'PlJava.distance2'
  IMMUTABLE
  LANGUAGE java;

your query could look like this:

select
    point.*, 
    pljava.distance2(vector, '{1.0, 1.0, 1.0}') as dist
  from
    point 
  order by
    dist;    

which results in

    vector     |       dist  
---------------+-------------------  
 {0.5,0.5,0.5} | 0.866025403784439  
 {0,0,0}       |  1.73205080756888  

Update

Stored functions can be written in C and C++ as well. C++ requires more effort, because the interface to PostgreSql is using the C calling convention. See Using C++ for Extensibility

Beryllium
  • 12,808
  • 10
  • 56
  • 86
  • ha, the Java thing is interesting (can you do the same using C++?). I know you could do it with C as you can choose 'C' as the language when writing the function definition in the pgAdmin tool. But this is useful as I plan to use more complex Distance functions. – Lyman Zerga Jun 19 '13 at 10:12
  • Yes, it's possible in C and C++ as well; I've updated the answer. – Beryllium Jun 19 '13 at 13:33
1

PostgresQL has a nearest neighbor index feature

http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.1#K-Nearest-Neighbor_Indexing

It can be used with PostgreSQL or PostGIS a GIS extensuion to PostgreSQL. See

K-Nearest Neighbor Query in PostGIS

Community
  • 1
  • 1
Tim Child
  • 2,994
  • 1
  • 26
  • 25