0

I have an sql table that has some data on retail locations with lat, long coordinates. My application queries the table to pull the data using python. Here's some sample data:

table1

id  name   lat       long
1   post   37.788151 -122.407570
2   sutter 37.789551 -122.408302
3   oak    37.815730 -122.288810

I have a point coordinate 37.785980 and -122.410851, and I'd like to find rows from the table that are nearby, say within 0.5 mile radius from the coordinate.

Performance matters. I cannot query all rows due to poor performance, long time sending data to client application and storage / memory issues. Therefore, I need a SQL based solution.

Using SQL in python, how do I implement an efficient solution that does proximity search and returns rows that meet the distance condition? Lat and Long columns are stored as TEXT. Here's how I pull data from sql in python.

# mysql connection
import pymysql
from sqlalchemy import create_engine
user = 'user1'
pwd = 'pwd'
host =  'xxxx.1.rds.amazonaws.com'
port = 3306
database = 'db1'
engine = create_engine("mysql+pymysql://{}:{}@{}/{}".format(user,pwd,host,database))


# Readdata
con = engine.connect()

query = '''
        SELECT x
        FROM table1
        GROUP BY key
        '''

df = pd.read_sql(query, con)

con.close()

Happy to share more info if required for context.

kms
  • 1,810
  • 1
  • 41
  • 92
  • Have you tried using a [spatial index](https://dev.mysql.com/doc/refman/8.0/en/creating-spatial-indexes.html) on a geometry type? – Nick ODell Apr 11 '22 at 23:09
  • Alternatively, as a simpler alternative to a full spatial index, it sounds like you're currently loading every row from the database and doing all filtering on the client side. You could try finding a bounding box to contain your 0.5 radius search area, and excluding all longitude/latitude pairs which are outside that bounding box. Then you can do a more exact filter on the client side. – Nick ODell Apr 11 '22 at 23:14
  • @NickODell yep, that make sense. Could you share how to implement that? I can modify the column type in SQL table if necessary. – kms Apr 11 '22 at 23:21
  • I'd prefer to not load every row and do filtering in SQL query / server-side. – kms Apr 11 '22 at 23:28
  • The SQL query to do this can be found [here](https://stackoverflow.com/a/5756248/530160). PS: What version of MySQL are you using? You'll need version 8 or better for this to work well. – Nick ODell Apr 11 '22 at 23:47
  • @NickODell `v8.0.27`. I am not sure if I need a Spatial Index? Although, I have created a `coords` column of `POINT` type using `Lat`,`Long` fields. – kms Apr 11 '22 at 23:52
  • 1
    If you want to avoid a [full table scan](https://en.wikipedia.org/wiki/Full_table_scan), you need a spatial index. If you don't care, then you don't need one. – Nick ODell Apr 12 '22 at 00:07
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/243803/discussion-between-kms-and-nick-odell). – kms Apr 12 '22 at 00:24

1 Answers1

0

Brute force requires a full table scan. That does not scale well.

Starting with some indexes and using a "bounding box" gives an order of magnitude improvement.

So does using a SPATIAL index.

More discussion on those, plus even faster ways: Find Nearest

PS. Don't store numeric values as text strings. (The link above talks about the 'right' datatype to use.

Rick James
  • 135,179
  • 13
  • 127
  • 222