I have a system which will return all users from the database and order the results by lowest distance from a reference zip code.
For example: User will come on the site, enter zip code and it will return him all other users who are nearest to his zip (ascending order)
How am i doing this now and why is it a problem ?
The system contains more than 30 million users and their zipcodes. I am retreiving all the users in a particular state and city (narrows the dataset down to about 10,000).
This is where the problem actually happens. Now, all the result sent by mysql (10,000) rows to PHP are sent to a zipcode calculator library which calculates this distance between the base zip code and user's zip code - 10,000 times. Then orders the result by the zip code nearest.
As you can see, this is very badly optimized code. And the 10,000 records are looped through twice. Not to mention the amount of RAM each httpd process takes just transferring data to and fro mysql.
What I would like to ask the gurus in here that is there anyway to optimize this ?
I have a few ideas of my own, but i'm not sure how efficient they are.
Try to do all the zipcode calculation and ordering in mysql itself and return the paginated number of rows. For this, i will need to move the distance between zipcode calculation logic to a stored procedure. This way I am preventing the processing of 10,000 records in PHP. However, there is still a problem. I would not need to calculate distance for zip codes which have already been calculated (for 2 users having the same zip code).
Secondly, how do i order rows in mysql using a stored procedure ?
What do you guys think ? Is this a good way ? Can i expect a performance boost using this ? Do you have any other suggestions ?
I know this question is huge, and i really appreciate the time you have taken to read till the end. I would really like to hear your thoughts about this.