2

Assume that I have the following model:

class Person:
     id       = models.BigAutoField(primary_key=True)
     name     = models.CharField(max_length=150)
     location = models.PointField()

How would I go by obtaining the k nearest neighbors (KNN) by location using geodjango?
Would I have to write custom SQL for that?
I am using PostgreSQL with PostGIS.

S-Man
  • 22,521
  • 7
  • 40
  • 63
AlanSTACK
  • 5,525
  • 3
  • 40
  • 99

1 Answers1

3

You can use a raw() sql query to utilize postgis order_by operators:

  1. <-> which gets the nearest neighbor using the centers of the bounding boxes to calculate the inter-object distances.

  2. <#> which gets the nearest neighbor using the bounding boxes themselves to calculate the inter-object distances.

In your case the one you want seems to be the <-> operator, thus the raw query:

knn = Person.objects.raw(
    'SELECT * FROM myapp_person 
    ORDER BY location <-> ST_SetSRID(ST_MakePoint(%s, %s),4326)',
    [location.x, location.y]
)[:k]

EDIT due to own derpiness: You can omit the [:k] to add LIMIT 1 on the raw SQL query. (Don't use both as I did!)


In the process of answering your other question: How efficient is it to order by distance (entire table) in geodjango ,another solution maybe possible:

By enabling spatial indexing and narrowing down your query through logical constrains (as explained in my answer of the above -linked question) you can achieve a pretty fast KNN query as follows:

current_location = me.location
people = People.objects.filter(
    location__dwithin=(current_location, D(km=50))
).annotate(
    distance=Distance('location', current_location)
).order_by('distance')[:k]
Harry Moreno
  • 10,231
  • 7
  • 64
  • 116
John Moutafis
  • 22,254
  • 11
  • 68
  • 112
  • For this scenario (obtaining knn), would it be still helpful to use a geography column? Or would it be pointless - since I am assuming the calculations involving `<->` would be different – AlanSTACK Aug 02 '17 at 08:20
  • 1
    You can use a geography column or a geometry one. What matters on speeding up your query is to use `spatial_idex`. For further reading on the subject, have a look here: https://boundlessgeo.com/2011/09/indexed-nearest-neighbour-search-in-postgis/ Good luck @Alan :) – John Moutafis Aug 02 '17 at 08:24
  • Hello, looking back at your answer, I am confused by the purpose of `LIMIT 1` in `knn = Person.objects.raw('SELECT * FROM myapp_person...`, why do we need that? – AlanSTACK Aug 31 '17 at 05:36
  • @Alan That will return the first nearest neighbor. You can change it at will! I will edit this in my answer as well. – John Moutafis Aug 31 '17 at 08:06
  • I thought `[:k]` took care of that? I am confused since `LIMIT K` and `[:k]` seem to be serving the same purpose here – AlanSTACK Aug 31 '17 at 15:13
  • @Alan that is correct... I totally derped on that! I will correct it :) – John Moutafis Sep 01 '17 at 08:36
  • I would advise using an f string to avoid hardcoding the table name: knn = Person.objects.raw( f'SELECT * FROM {People._meta.db_table} ORDER BY location <-> ST_SetSRID(ST_MakePoint(%s, %s),4326)', [location.x, location.y] )[:k] – CitizenFish Aug 18 '23 at 07:54