7

I am building REST APIs that return data (lets say events ) in particular area. The REST URL is a simple GET

/api/v1/events?lat=<lat>&lng=<lng>&radius=<radius>.

with parameters lat, lng and radius (10 miles by default), the latitude and longitude are what the device or browser APIs return. Now needless to say that the lat and lng change continuously as the user moves and also two users can be same vicinity with different lat / lng. What is the best way to cache such kind of requests on the server so that I don't have to dip into business logic everytime. The URL is not going to unique since lat/lng change.

Thanks

ngrashia
  • 9,869
  • 5
  • 43
  • 58
rOrlig
  • 2,489
  • 4
  • 35
  • 48
  • Do you mean caching in your application code, or are you using a cache server that will allow caching the requests with querystring parameters? – Pedro Werneck Jun 03 '14 at 20:14
  • So let me clarify if two people make a request to get information of restaurants at particular lat and lng and radius on yelp. Lets the absolute distance between them is <0.1 mile or some threshold. How do you code the system so as to return results from cache and not dip into the backend database etc... – rOrlig Jun 03 '14 at 20:44
  • 1
    OK, you mean caching the results of your geolocation algorithm and database, so that when you get another location nearby, it uses the cache, not the http requests. Let me think about it. – Pedro Werneck Jun 03 '14 at 20:54
  • exactly ... i think it is general enough problem.. didn't seem to find any discussion or thought about it anywhere. – rOrlig Jun 03 '14 at 21:05
  • Take a look on r-trees. Not the fastest structures on earth, but may solve your problem – Plínio Pantaleão Jun 10 '14 at 18:34

3 Answers3

3

I'm assuming you have some sort of "grid", and when a user requests a specific coordinate, you return the grid tile(s) around the location. So you have an infinite URL space (coordinates) that is mapped to a finite number of tiles. One solution is to redirect every request to the "canonical", cache friendly URL for that tile, e.g.

GET /api/v1/events?lat=123&lng=456
=>
302 Found
Location: /api/v1/events?tile=abc

Or, if you want to retain the lat/long info in the URL, you could use the location of the center of the tile.

trendels
  • 4,747
  • 3
  • 26
  • 16
2

I think the best approach is for you to store the results in a cache with the center coordinates as a key, and later query the points within the circle for the new request.

I'm not aware of any cache engines that would allow you to perform spatial queries, so I think you'll have to use a database that allows easy querying and indexing of spatial data. You may use that database for caching your results, or at least store a key to that result in a cache engine somewhere else, and later you can query them with spatial coordinates, asking for all points with a threshold distance to your new request.

There's PostGis for PostgreSQL, which should be quite straightforward since it has full support for latitude/longitude distance computations. Once you have it setup with proper indexes, it should be as easy as:

SELECT * FROM your_cache_table 
WHERE ST_Distance_Sphere(the_geom, ST_MakePoint(new_lon, new_lat)) <= 160.934

MySQL has some support for the OpenGis extensions, however it doesn't have support for latitude/longitude distance computations. Maybe you'll need to do some calculations by yourself, maybe the simple cartesian distance works for you. Check the documentation here, and this answer should also help.

I also believe even MySQL 5.6 still has support for spatial indexes only in MyISAM tables, but that shouldn't be an issue since you're using them only for cache.

Managing the cache may be a little more complicated than usual. If you need expiration, you should probably store only keys in the database and set an expire parameter on the cache server. When you hit a database point for which there's no longer a valid key, you clean it from the database. You'll probably need a way to invalidate cache when the primary data changes, removing from both the database and the cache server.

Community
  • 1
  • 1
Pedro Werneck
  • 40,902
  • 7
  • 64
  • 85
-1

I have been data modelling a hobby application that also needs to deal with geolocation data and have had to try to solve similar problems. The solution will of course depend on the constraints that you have and the actual use cases that are crucial to your application's purpose. I will assume that you have design flexibility to change all aspects of your application. i.e. the technology stack.

note:

... so that I don't have to dip into business logic everytime. The URL is not going to unique since lat/lng change.

The above is an ambiguous statement, since not dipping into business logic can mean many things. I will assume it means you don't want to make any database queries to retrieve new data that might be similar to the data you already have in the cache. Also assuming that you only want to cache on the server, the following approaches come to mind:

  1. Cache the data between the database and the application.

    Database ---> Cache --> App ---> User

In this approach, your application processes all the rest api calls and then decide whether the results in the cache can be used or another database access is required.

  1. Cache the data between the application and the user.

    Database --> App ---> Cache ---> User

This approach is a little tricky considering that the url is always changing. So you might need a 'smart caching mechanism' that will process the incoming url and then decide whether the cached data is relevant. The smart caching can be done in a number of ways depending on how your application is implemented. Something like mongodb could be used as a json cache and then each incoming request can be preprocessed to see if the data can just be returned from this cache or redirect to the main application. So the structure might look like this:

Database --> App ---> (Some logic(e.g NodeJS app) + Mongodb) ---> User

conclusion:

Without knowing the architecture of your solution, the critical use cases and the full design constraints, one cannot really suggest a complete solution to this problem. You might have to rethink certain features you are trying to provide and make tough compromises to get things working. Hopefully, the suggestions provided by different people here will be helpful.

Alappin
  • 674
  • 8
  • 9