1

I first have to say that I really am a rookie in caching, so please do elaborate on any explanation and bear with me if my question is stupid.

I have a server with pretty limited resources, so I'm really interested in caching db-queries as effectively as I can. My issue is this:

I have a MYSQL DB with a table for geolocations, there are columns (lat and lng) - I only indexed lat since a query will always have both lat and lng, and only 1 index can be effectively used to my understanding (?).

The queries are very alternating in coordinates like

select lat, lng 
where lat BETWEEN 123123123 AND 312412312 AND lng BETWEEN 235124231 AND 34123124

where the long numbers that are the boundaries of the BETWEEN query are constantly changing, so IS there a way to cache this the smart way, so that the cache doesn't have to be a complete query match, but the values of previous between queries can be held against a new to save some db resources?

I hope you get my question - if not please ask.

Thank you so much

Update 24/01/2011

Now that I've gotten some response I want to know what the most efficient way of querying would be.

  1. Would the Between query with int values execute faster or
  2. would the radius calculation with point values execute faster

if 1. then how would the optimal index look like?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Jakob
  • 4,784
  • 8
  • 53
  • 79
  • @Carl Manaster, @Quassnoi what would you consider the best datatype for my lat/lon columns? Meaning what would be faster the query where i search for BETWEEN with index on lat, or the POINT calculation? – Jakob Jan 20 '11 at 23:13
  • @Carl Manaster, @Quassnoi I would really appreciate your input as to what would be most efficient, int indexing or the point calculation? – Jakob Jan 23 '11 at 01:06

2 Answers2

2

If your table is MyISAM you can use Point datatype (see this answer for more details)

If you are not willing or are not able to use spatial indexes, you should two separate indexes:

CREATE INDEX ix_mytable_lat_lon ON mytable (lat, lon)
CREATE INDEX ix_mytable_lon_lat ON mytable (lon, lat)

In this case, MySQL can use an index_intersect over these indexes which is sometimes faster than mere filtering with a single index.

Even if it does not, it can pick a more selective index if there are two of those.

As for the caching, all pages read from the indexes are cached and reside in memory until they will be overwritten with hotter data (it not all database fits to the cache).

This will prevent MySQL from the need to read the data from disk.

MySQL is also able to cache the whole resultsets in memory, however, this requires the query to be repeated verbatim, with all parameters exactly the same.

Community
  • 1
  • 1
Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • that is an excellent answer! I actually didn't know that spatial indexes existed, so thank you for informing me - right now I get the boundaries of the map and call the "between" to get any markers within the visual range of the map. Is this less efficient than the spatial? it seems that the radius would require some kind of circle-calculation, and I'm really keen on not puttin too much load on the db. What would be your verdict on the method that would put the least strain on the db? – Jakob Jan 18 '11 at 18:53
  • @Jakob: `BETWEEN` can only use an index to filter on `lat` or `lon`, not both. If it's `OK` for your query to return the object within the square but out of its incircle, then you may use just a pair of betweens. Circle calculations look bulky but they are not actually that hard for the modern processors. Reading and transmitting extra data may imply more overhead than a pair of floating point instructions. – Quassnoi Jan 18 '11 at 19:47
  • The map is square, and the points are filtered by rating and not how close they are to the mapCenter, so The square is functionally ok, my consideration is mostly of serverload and speed respectively. Can I get you to elaborate on the indexes that you've examplified, I can't get my head around why would I need two, I'm probably not getting something about the combined order. – Jakob Jan 18 '11 at 20:01
2

I think to do significantly better you'll need to characterize your data better. If you've got data that's uniformly distributed across longitude and latitude, with no correlation, and if your queries are similarly distributed and independent - you're stuck. But if your data or your queries cluster in interesting ways, you may find that you can introduce new columns that make at least some queries quicker. If most queries happen within some hard range, maybe you can set that data aside - add a flag, link it to some other table, even put the frequently-requested data into its own table. Can you tell us any more about the data?

Carl Manaster
  • 39,912
  • 17
  • 102
  • 155
  • thank you for participating, I really love to hear these thoughts. Unfortunately the markers are pretty much scattered across the whole world, and the queries are very dissimilar. I should probably look more into indexing, since I take it from your response, that caching is really only a concern after the quieries have been optimized and the data-structure is as best formed as can be? – Jakob Jan 18 '11 at 18:56
  • @Jakob, if the queries are very dissimilar, I think it's unlikely that caching will be able to help you significantly. The likelihood of the data of interest being in the cache is small in that situation, so even if the performance advantage of cached data was very high, the likelihood of reaping that benefit would be very low. Sorry! – Carl Manaster Jan 18 '11 at 20:00
  • thank you for your input - sometimes the right answer is "ask another question", and this seems like one of those times. Thank you for pointing this out to me. – Jakob Jan 18 '11 at 20:31