9

I inherited an e-commerce software project written in PHP. As I was inspecting the codebase, I found a lot of SQL statements all over the code. There are a lot of classes like Product, Category, User, Customer, etc. and every class has a lot of database queries.

I didn't know how to treat this situation and decided to count the total queries of a single page visit. I encapsulated the MySQL query function and increased a counter.

I was a little shocked at the result. To visit the index-page alone, there were 1633 (!) MySQL select queries executed. Listing the products of a category triggered almost 2000 queries.

I piped the queries into a text file to analyze them. Over 90% are single select statements of maybe one or two values. Now what should I do to clean up this mess? What is your advice? I enabled caching at the MySQL server. Loading the page takes about 490ms.

Additional detail

For example, there is a class called Product. Inside this class there are 8 single small SQL select statements.

When you now open the category listing to display the products, the original programmer used one select statement to get get a list of the needed products and then created a product object for each of them.

Let's say this result gives us 20 products:

select id from products where price <= 10;

then he iterates through the results and creates a product object for every entry:

$qresult = query("select id from products where price <= 10");
$products = array();
foreach ($qresult as $prod) {
  $products[] = new Product($prod['id']);
}

This alone generates 20 * 8 SQL queries just for the products. And the same method is used for other classes too (User, Customer, Category and so on).

Some Time Ago

Now, after some weeks/months have passed, I wanted to share my solutions I did so far.

I could cut down the queries to < ~50 per page visit and reduced the page loading time to under 400ms.

I did it very easily. I tried to identified the hotspots and build a table cache class. Each visit this static class loads the whole table content to memory and each table request from now on will served out of the memory from the static class. Well very dirty and not that nice but it works, is faster, reduces the total queries and spares the server hardware.

I guess we also will throw hardware to this problem als long as the user count increases in that ratio as it did by now.

If we come to the point to replace the application by another, we will definitely head for a database-query-nice solution

thanks all for your advices

Ello
  • 907
  • 1
  • 15
  • 33
  • 1
    see http://stackoverflow.com/questions/5351225/mysql-query-to-reduce-server-load – Ankit Chauhan Feb 27 '13 at 13:34
  • 1
    @Ankit that link doesn't seem to provide a satisfactory answer for this situation though? 2000 queries per request are a *lot* – Pekka Feb 27 '13 at 13:35
  • I would like to see an example of this verbosity... – L0j1k Feb 27 '13 at 13:39
  • @ello I'm not sure how much you know about php and mysql. But you can combine your queries using join, use some static result, try some caching mechanism. – Ankit Chauhan Feb 27 '13 at 13:39
  • 2
    Looks like you've got [n+1 selects problem](http://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem). In fact, here we've got a [special tag](http://stackoverflow.com/questions/tagged/select-n-plus-1) for this kind of situations. – raina77ow Feb 27 '13 at 13:41
  • 1
    Holy god. 1633 queries on a single page? And I was concerned when my pages had more than 5... – crush Feb 27 '13 at 13:41
  • There's no general recipe for this - you'll likely need to delve into the code, eliminate identical queries and such. Finding the most heavily used queries and caching them using something like memcache might help, too. – Pekka Feb 27 '13 at 13:43
  • Thank you all for your responses, I#ll try several technics like caching, static table content in the most used classes and so on. wow this will take some time – Ello Feb 27 '13 at 14:18
  • On update: well, that's what you get when your models are too close to DB. I'd suggest injecting an intermediate layer in your application, separating models and db (tables) with mappers: the latter do know how to convert rowsets into models, but they don't have to fetch these rowsets from db each time. BTW, does this app use any framework? – raina77ow Feb 27 '13 at 15:38
  • Stating the obvious, but since it hasn't been mentioned yet: use a version control system, so if you run into trouble with a particular refactoring, you can rollback easily. – halfer Feb 27 '13 at 16:03
  • You've got to go back to the database design and the data normalisation stage. What are you holding in which table? What columns overlap and may be a) redundant or b) ideal for a join. Look at what rows are needed could 1633 SQL calls become one call and 1633 fetches? Are statement prepared or built before each singleton select? – Dougie Dec 26 '18 at 06:33

4 Answers4

1

Well, for starters the first thing you need to identify lies in the php realm, and is how many of those calls are executed more than once. That alone will cut down your numbers.

Other than that you can do two things.

  1. If you have several queries that use the same parameters (say productId, and you bring the product table, the product category table and so forth, you can always join those queries and bring a single result with all you need (or make views of the most common queries joined and work with that)
  2. If you have a settings table, the best thing to do is load all the table into memory and read all values from there, so you don't have to go to the DB anytime you need a setting
Carlos Grappa
  • 2,351
  • 15
  • 18
1

I think you're right to be concerned. 2,000 queries seems a lot.

You’ve already identified two good reasons for refactoring, the amount of queries and the page response time – both measureable and suitable for refactoring.

Although it is true MySQL has a query cache, and it will usually not query the underlying data each time if it can fulfil the query from the cache there could still be a network cost of talking to MySQL.

Have you thought about saving values to memory or using a session variable?

<?php
session_start();
// store session data
$_SESSION['sharedvalue']= "result of common MySQL query"
?>

< html>
< body>

< ?php
//retrieve session data - now each time you do this it isn't asking MySQL again
echo "Common Value=". $_SESSION['sharedvalue'];
?>

< /body>
< /html>

Another solution would be have your own cache and query common values from that, changing or expiry them for refreshing old data. This would really be dependent on the size of your application and user base.

halfer
  • 19,824
  • 17
  • 99
  • 186
Steve
  • 3,673
  • 1
  • 19
  • 24
  • Not sure about using the session - I think that tries to serialise the classes and writes them to disk, which isn't suitable for classes that are likely to contain resource references (e.g. database handles). A class-based static object cache might be better, which would do what your're suggesting. – halfer Feb 27 '13 at 16:01
  • @haifer you're right, the session isn't the best solution. For classess it will serialise them and store them in memmory / disk. The serialisation and deserialisation isn't too much of an overhead compared with network traffic to query the data again from the MySQL server (or MySQL server query cache). The advantage of session is you can share it over multiple servers which you cannot do with a static class - however if you had multiple servers using a real cache server such as memcache would be the better option anyway. – Steve Feb 28 '13 at 09:08
1

I have the same problem with a legacy app I work on, together with the same level of query inefficiency! In the short term, we just throw hardware at the problem, but at least any new work going forward is better written. Sounds like a custom ORM without the necessary features, or one that isn't used in an optimal way.

Try refactoring this:

$products = array();
foreach ($products as $prod) {
  $products[] = new Products($prod['id']);
}

Into this:

// Assuming products is an array of arrays, with each inner array
// containing all the values that make up an array
$products = array();
foreach ($products as $prod) {
  $products[] = Products::convertToObject($prod['id']);
}

That'll save you N queries per loop right there. If you ensure that each ORM class (Products etc) inherits from something like BaseORM then the common code to do this can be written just once.

If you find that you don't have much in the way of joins, then branch the current code and have a play with replacing your home-grown ORM with something like Propel or Doctrine. Be aware that there's a learning curve to using ORM systems you don't know, but they payoff in the end generally better than writing the whole thing yourself.

A Friend
  • 1,420
  • 17
  • 22
halfer
  • 19,824
  • 17
  • 99
  • 186
  • Thank you for your response, i got similar in mind, but i guess the first is to isolate the expensive spots and refactor (or build) ORM-classes. – Ello Feb 27 '13 at 15:09
  • Well, the `convertToObject()` above will do that as a very quick win. If you don't want to worry about inheritance to start with, add this method into each class that needs it. – halfer Feb 27 '13 at 15:59
0

There is probably one (or more) loops (foreach or while, etc) that is calling mysql_query for each iteration.

Once you find that loop you can decide how to optimise it - for example, you could load the records all at once into an array and then the loop works on the array rather than calling the database each time.

acutesoftware
  • 1,091
  • 3
  • 14
  • 33
  • It sounds worse: "single select statement of maybe one or two values.". I imagine something like `public function getFoo() { query("SELECT foo FROM bar WHERE id=?", $this->id)`;}` - happy refactoring... probably you're better off throwing everything away and start anew – Fabian Schmengler Feb 27 '13 at 13:45
  • exactly that it is what happened :| – Ello Feb 27 '13 at 13:56