0

In our app we use these 3 tables:

  1. categories
  2. cities (category_id)
  3. city_data (distance, city_id)
$q = "SELECT a.id as aid,a.distance as adistance, 
           b.id as bid,b.distance as bdistance 
        FROM city_data as a 
        JOIN city_data as b on a.id != b.id 
        JOIN cities AS a_cities ON a.city_id = a_cities.id
        JOIN cities AS b_cities ON b.city_id = b_cities.id      
        WHERE (a_cities.category_id='".$_GET["c"]."' AND b_cities.category_id='".$_GET["c"]."')
        AND abs(a.distance - b.distance) < 100 ORDER BY RAND() LIMIT 1";

The tables cities, city_data has the same count of rows - almost 5.000. The query above takes about 45 seconds, which is terrible. Even the worse thing is, that the tables should have another 5.000 rows, which gives 10.000 rows total...

I would like to ask you about any way, how to reduce the time of execution the query above... 45s is not acceptable...

Do I have any option to solve this issue?

EDIT: Thank you for your advices, I removed ORDER BY RAND() part and the time is really lower, about 22 seconds, but this is still too high for usual using

user984621
  • 46,344
  • 73
  • 224
  • 412
  • 1
    its `ORDER BY RAND()` which causes that. – John Woo Sep 06 '12 at 23:13
  • Should `b_citiess.category_id` be `b_cities.category_id`? – Kermit Sep 06 '12 at 23:15
  • see this [*link: Alternatives to ORDER BY RAND()*](http://stackoverflow.com/questions/1823306/mysql-alternatives-to-order-by-rand) for its alternative – John Woo Sep 06 '12 at 23:15
  • Also see [this article](http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/) – Basic Sep 06 '12 at 23:16
  • 3
    Do you have indexes on all of the joined fields? Also on a side note, you are vulnerable to SQL injection with your $_GET – Kris Sep 06 '12 at 23:18
  • @Kris - I know about SLQ injection and will check it later. The indexes on all of the joined fields is not done, I have to take a look, how to do it. Thank you for the tip – user984621 Sep 06 '12 at 23:24
  • Indexes will greatly improve performance. I have had plenty of queries (reports) that took 5 minutes because I forgot to index a column. Once indexed properly it ran in 5 seconds. – Kris Sep 06 '12 at 23:25
  • 1
    "I know about SQL injection and will check it later." This is the sort of thing is famously said before your site is cracked wide open with an [automatic exploit tool](http://sqlmap.org/) and you have to explain to your boss, client or customers why your user database was posted to the internet. Please be **extremely** careful when using SQL and **always** use [SQL placeholders](http://bobby-tables.com/php) to escape your data. Not having enough time is not an acceptable excuse. – tadman Sep 07 '12 at 00:35

5 Answers5

2

Make sure you have created indexes on the joined columns (cities.city_id, city_data.id, cities.category_id)

Kris
  • 6,094
  • 2
  • 31
  • 46
  • Well, to be honest, I am a bit confused in the indexing. Could you give me please an example of how to indexes for these tables? I never work with indexes before – user984621 Sep 06 '12 at 23:38
  • `CREATE INDEX cdata_city ON city_data( city_id )` is going to help a lot if it doesn't already exist. – Fred Sobotka Sep 07 '12 at 00:08
  • @user984621 are you using phpmyadmin or just the default mysql client or? You can run the query that fred suggested – Kris Sep 07 '12 at 01:25
  • I use phpmyadmin. Would be for this purpose better to use `CREATE INDEX cdata_city ON city_data(city_id)` or `CREATE INDEX cdata_city ON city_data(cities.city_id, city_data.id, cities.category_id)`? Also, I just run this query and that's all? – user984621 Sep 07 '12 at 08:05
  • Ok, in phpmyadmin go to table structure. There will be an indexes option underneath the structure. Add one to each column. For type select "index", don't enter anything in for the name or size. – Kris Sep 07 '12 at 12:59
2

I don't know how your tables are designed but you should keep city1 to city2 distance information in one (maybe another) table and get rid of last 2 joins and get category informations for results in another query.

as in previously given sample ( Calculating distance between 400 cities and optimizing MySQL for it? )

SELECT c1.name, c2.name, cd.dist 
FROM cities_dist cd
  INNER JOIN cities c1 ON cd.city1 = c1.id
  INNER JOIN cities c2 ON cd.city2 = c2.id
WHERE cd.city1 = your_id
   OR cd.city2 = your_id
ORDER BY cd.dist ASC

And be sure you have the right index and field type definitions.

Community
  • 1
  • 1
Erdinç Çorbacı
  • 1,187
  • 14
  • 17
1

Why are you making this join?

FROM city_data as a JOIN city_data as b on a.id != b.id 

You are joining the data of the city_data table with the data of the same table in which you are not matching a unique relation between them. I guess this is what makes your query so slow.

Nikola
  • 14,888
  • 21
  • 101
  • 165
Hernan Velasquez
  • 2,770
  • 14
  • 21
  • It's almost a cartesian product so he'll get a dataset of size `(TableRows)*(TableRows-1)` (Before the `WHERE` is applied) - I assume it's because he's limiting to cities within a certain radius (The `WHERE abs(a.distance - b.distance) < 100`) – Basic Sep 06 '12 at 23:19
  • yep, exactly. It's working well, but the time of execution is horrible – user984621 Sep 06 '12 at 23:21
0

Break the query up. Turn each subquery into a view. Run them separately. That should allow you to improve performance. Running one long really complex query is usually slower than running a simplified query. Also limit the results of each subquery where possible. Then perhaps you can use unions to join the results. That would be my first approach.

SonOfSeuss
  • 400
  • 5
  • 18
0

It might make sense to pull all this data back into PHP and do the looping/distance calculations there- this might be faster in code than it is in the DB.

DNadel
  • 495
  • 1
  • 5
  • 13