1

So currently I am using geohashes to do location based queries as such (following this stackoverflow post: Finding geohashes of certain length within radius from a point)

public extension CLLocationCoordinate2D {

    func boundingBox(radius: CLLocationDistance) -> (max: CLLocationCoordinate2D, min: CLLocationCoordinate2D) {
        // 0.0000089982311916 ~= 1m
        let offset = 0.0000089982311916 * radius
        let latMax = self.latitude + offset
        let latMin = self.latitude - offset
        
        // 1 degree of longitude = 111km only at equator
        // (gradually shrinks to zero at the poles)
        // So need to take into account latitude too
        let lngOffset = offset * cos(self.latitude * .pi / 180.0)
        let lngMax = self.longitude + lngOffset
        let lngMin = self.longitude - lngOffset
        
        
        let max = CLLocationCoordinate2D(latitude: latMax, longitude: lngMax)
        let min = CLLocationCoordinate2D(latitude: latMin, longitude: lngMin)
        
        return (max, min)
    }
func isWithin(min: CLLocationCoordinate2D, max: CLLocationCoordinate2D) -> Bool {
        return
            self.latitude > min.latitude &&
                self.latitude < max.latitude &&
                self.longitude > min.longitude &&
                self.longitude < max.longitude
    }

}
func getGeohashPrefix(){
        let loc = CLLocationCoordinate2D(latitude: lat!, longitude: long!)
        MBR = loc.boundingBox(radius: 16093.4) //16093.4 meters = 10 miles
        //corners = [NorthWest, SouthWest, SouthEast, NorthEast] in lat n long
        let corners = [CLLocationCoordinate2D(latitude: MBR.0.latitude,longitude: MBR.1.longitude),
                       MBR.1, CLLocationCoordinate2D(latitude: MBR.1.latitude, longitude: MBR.0.longitude),
                        MBR.0]
        var geohashes_of_corners: [String] = []
        for corner in corners {
            geohashes_of_corners.append(corner.geohash(length: 12))
        }
        geohashes_prefix = geohashes_of_corners.longestCommonPrefix()

    }

var query: Query = db.collection("Users").whereField("geohash",isGreaterThanOrEqualTo: geohashes_prefix).whereField("geohash",isLessThanOrEqualTo: geohashes_prefix + "~").order(by: "geohash", descending: false)

       query.getDocuments { (querySnapshot, err) in
           if err != nil{
               print("error getting da documents")
           }else{
                if querySnapshot!.isEmpty{
                    return completion(arr_of_people)
                }
                for document in querySnapshot!.documents {
                    let d = document.data()
                    let isPersonWithin = CLLocationCoordinate2D(latitude: (d["loc"] as! GeoPoint).latitude, longitude: (d["loc"] as! GeoPoint).longitude).isWithin(min: self.MBR.1, max: self.MBR.0)
                  if !isPersonWithin{
                         continue
                 }
 
                    arr_of_people.append([d["firstName"] as! String,  d["lastName"] as! String])
                   }

               return completion(arr_of_people)
           }
       }

As you can see, I am querying for documents with a specific prefix and then filtering those documents AGAIN on the client. Is that safe? If not, what is the workaround? Use cloud functions, different algorithm (suggest one if you have it), or something else?

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
Kunal M
  • 47
  • 1
  • 5

1 Answers1

1

A query on geohashes returns points within a certain range of geohashes, which are (somewhat) rectangular regions.

A geoquery on a central point and a distance returns points that are in a circle.

Since the two shapes are different, your code uses a client-side check to cut off the points that are outside the circle, but inside the rectangles. This is a normal step when performing geoqueries for a max distance around a point when using geohashes.


Here's an example of this on a map:

enter image description here

The green pins are in a 250km circle around San Francisco, which is what I queried for. The red pins are outside of that circle, but within the set of geohash ranges ([["9q0","9qh"],["9nh","9n~"],["9r0","9rh"],["9ph","9p~"]] here) that needed to be queried to be sure we had all points in range.

As said: this so-called oversampling is inherent to using geohashes to perform point-and-distance queries. In my experience you'll end up reading between 2x and 8x too many documents.

It might be possible to reduce the number of false positives by querying for more-but-smaller ranges, but I don't know of any geo-libraries for Firestore that do that.

I mapped the extra cost as a mental model: finding documents within a certain distance from a point costs me 2x to 8x more than a regular document read.


Moving the operation to Cloud Functions on the server makes no difference on the number of documents that needs to be read, it just changes where they are read. So you can perform the operation on a server to reduce the bandwidth of transferring the documents from the database to the client. But it won't make a difference in the number of documents that need to be read.

As discussed in comments: performing the query on the server does allow you to remove client-side access from the data, so that you can ensure the application code will never see documents that are not within the requested range. So if you're worried about document access, performing the query in a trusted environment (such as a server you control or Cloud Functions) is a good option.


To not pay for the extra document reads, consider finding a service that natively supports geoqueries with a pricing model based on the number of results it returns - instead of the number of results it has to consider. Such a service will (very likely) still consider too many points, but if the pricing model matches what you want that might be worth it.

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
  • Firstly, hi Frank! I only got into geohashing after watching your presentation on it: https://www.youtube.com/watch?v=mx1mMdHBi5Q&t=2280s. Secondly, if you watch this video from 29:50 - 30:35 https://www.youtube.com/watch?v=lW7DWV2jST0&t=1800s Is this not the same problem I would have if I filter on the client? – Kunal M Aug 15 '20 at 17:06
  • To perform a geoquery from the client,the user needs to be able to read every document in the collection. There is not way to limit access on geo-ranges with security rules, which are all that stands between your application code and access to all the data. If you're worried about people having access to documents outside of range, you should always perform the query on a secure environment (a server or cloud functions). But in that case the problem is not about oversampling (which your focus on the client-side `isWithin` seemed to suggest), but about document access to begin with. – Frank van Puffelen Aug 15 '20 at 17:13
  • I also added a bit more about this to my answer. Sorry for it being a bit long, we've been working on some geo-stuff this week, so it's top of mind for me now. :) – Frank van Puffelen Aug 15 '20 at 17:18
  • I am not concerned with overquerying; I am concerned with potential vunerability that comes from filtering on the client. So your suggestion is to do that in cloud functions? I am not very experienced in cloud functions, so is this very task possible? – Kunal M Aug 15 '20 at 17:19
  • I tried explaining the security considering in my answer too. There's no way to secure querying on range (or even geohash ranges) in security rules. If you need to ensure that users can never see documents except for the ones in the range, you'll need to query in a trusted environment instead of directly from the client. – Frank van Puffelen Aug 15 '20 at 17:25