1

I have access to a list of lat/long coordinates, and I want to know (roughly) the US State these coordinates are located in. I can do with loss of precision, but I can't rely on external libraries or API. I can also add a database of locations in my code.

What is a reasonable way to do this? I thought about 3 possibilities:

  1. Represent each state by a single point at its center, then do a nearest-neighbour search
  2. Represent each state by points located at cities in the state, then do a nearest-neighbour search (with much more points)
  3. Represent each state by a simple bounding box, then use some algorithm to query which bounding box my point belongs to

What do you think is best? I would tend to think about solution 3, but I can't find a list of coarse "bounding boxes" for US states

lezebulon
  • 7,607
  • 11
  • 42
  • 73

2 Answers2

1
  1. Will not work, consider

Proof 1

  1. Has a high likelihood to not work for at least some states. Consider states with towns/cities more clustered to the middle, against states with towns/cities clustered to the edge.

enter image description here

  1. Will not work (these were supposed to be 90 degree angles, perfect squares, but drawing with a mouse is hard :) )

enter image description here

If you want to do this even vaguely accurately you will need some shape data which defines the boundaries between states. You will then need an algorithm which can determine whether a point is within an irregular polygon

See List of the United States (US) state boundaries / borders as latitude/longitude pairs for geofence?

Michael
  • 41,989
  • 11
  • 82
  • 128
  • Thanks ! Indeed the first 2 methods are very inaccurate but they are also quite fast (at least method 1) I will try to focus on method 3 – lezebulon Jun 30 '20 at 15:59
1

I made a little search and find out a proper solution for what you are looking for with a dataset of bounding box.

Answer on StackOverflow: LINK

Dataset: LINK

Algorithm to use(implement): LINK

So yes, the proper way to implement it's using the solution 3 with the given dataset.

Hope it helps :)

Carlo Corradini
  • 2,927
  • 2
  • 18
  • 24