8

I have made a lot of searches about caching data in files (serialize/unserialise vs json_encode/decode, var_export, igbonary) and mysql queries (optimizations, stored procedures, query cache), but at this moment, I wonder what is the better way to optimize a concrete case like the following.

Sorry in advance : this is a long topic for a small answer I guess, but it is necessary to understand the project. And excuse my poor english, which is not my first language.

Let's imagine that we have this database relationships. enter image description here

Description of the database (estimated number of records in parentheses) :

  • MODULE (10) : is the type of Item, could be article, forum topic, ad, news...
  • ITEM (millions) : any type with a title and some text
  • CATEGORY (50) : items categories (animals, politic, cars, computers...)
  • TAG (hundreds of thousands): category's tags (ex. for politic : Internationnal, France, Barack Obama...)
  • ITEM_TAG (outch) : items and tags associations

So we have several relationships, and each is recorder at the ITEM creation/update.

I have already cached ITEMs data in folders and files with the following example :

public function cacheItem()
{
    $req=mysql_query("SELECT id, title, content, id_mod, id_cat
            FROM ITEM
            WHERE ITEM.id='".$this->id."'")or die(mysql_error());
    if(mysql_num_rows($req)==1)
    {
        $this->itemData=mysql_fetch_array($req);
        $this->folder=floor($this->id/1000);//1000 items max per folder
        $this->itemUrl=$this->folder."/".$this->id.".txt";                      
        if(!file_exists($this->itemUrl))
        {
            touch($this->itemUrl);
        }
        file_put_contents($this->itemUrl,serialize($this->itemData),LOCK_EX);
    }
}

And I get them by an unserialize(file_get_contents($url)), this part works like a charm !

Now I wish to optimize the lists of ITEMs to display them by several options (for example), foreach display with a limit of 100 per pagination :

  • ALL ITEMs
  • ITEMs of a MODULE
  • ITEMs of a CATEGORY
  • ITEMs of a CATEGORY and a MODULE
  • ITEMs of a TAG
  • ITEMs of a TAG and a CATEGORY
  • ITEMs of a TAG and a CATEGORY and a MODULE

I already know how to do this in SQL and to put the results in a cache tree.

My problem, with those cache files, is that when a new ITEM is created/updated, the list may have to be refreshed with a lot of strictness.

First question :

So what will happen if ITEMs are created/updated (so those lists too) at the same time ?

Does the LOCK_EX of the file_put_contents(); will do his job while getting files from file_get_contents(); ?

Second question

I understand that more PHP will work, less mySQL will (and the otherwise), but what is the better (faster to display) way to do those lists with pagination, which will be displayed each second or more, and only modified by adding/updating a new ITEM ?

  • My cache system (I don't think so...)
  • Stored procedures in mySQL
  • Several database servers and/or several files servers
  • Other

Any ideas, examples, links greatly appreciated.

P.S. : just for fun I may ask "how does Facebook" and "how does stackoverflow" ?

Community
  • 1
  • 1
Valky
  • 1,856
  • 3
  • 20
  • 38
  • 1
    Nothing to help you but many thanks for the tip about the folders files limit based on row id `floor($this->id/1000);`. Good idea ! –  Feb 14 '13 at 04:42
  • RDBMS have put a lot of effort into optimizing their structures, so your own caching will have a hard time beating that. Opening one file for every item will cost you. I'd say go with MySQL. If (and only if) performance isn't sufficient, consider multiple load-balancing servers and/or other (probably commercial) database servers. Don't see that happening for a while yet, though. Don't see any call for stored procedures either. With [pagination](http://stackoverflow.com/questions/tagged/pagination) you can try to be clever, but there are separate questions here on SO for that. – MvG Feb 14 '13 at 08:18
  • Ok, so let's see about mySQL for the beginning... in PHP does mysqli queries will be better (I don't use it at this moment) ? – Valky Feb 14 '13 at 13:31
  • 2
    Use mysqli, mysql_query is deprecated. Second, use memcached or Redis for caching. – Denis Ermolin Feb 14 '13 at 13:34
  • Ok, thanks. Mysqli or PDO ? I didn't used them ever. – Valky Feb 14 '13 at 13:56
  • I'm pretty sure PDO will be the only db access layer included in future versions so you should go with that. – Brent Baisley Feb 14 '13 at 16:16
  • `mysqli` will be available for the foreseeable future; only mysql is going away. `mysqli` can also function almost exactly the same as `mysql` by just adding the `i` to most functions. – G-Nugget Feb 14 '13 at 16:47
  • @G-Nugget -- I have read a review that indicated that `mysqli` is actually slower than `mysql`. I would still use it for future compatibility though. – neelsg Feb 14 '13 at 23:00
  • @neelsg - I've seen some things on the performance, too, but from what I remember, the difference is marginal and you wouldn't notice it unless you're benchmarking or running hundreds of queries per page. – G-Nugget Feb 15 '13 at 14:39

1 Answers1

2

First question:

Your operations should be fine with LOCK_EX. The files may get locked if accessed simultaneously which will definitely slow things down, but all operations should complete correctly. However, this is a good example why you should not implement your own cache system.

Second question:

MySQL will definitely be faster than your cache system (Unless you do some seriously wicket coding and not in PHP). Databases like MySQL have done a lot of work in optimizing their performance.

I don't believe that stored procedures in MySQL will offer you any real benefit in the examples provided above over plain old SELECT queries.

Using a NoSQL approach like MongoDB can help you if you use sharding on a server cluster. This is more difficult to write and more servers cost more money. Also, it is not clear from your question if moving to a different database system is an option.

If you stick with MySQL, it is probably easier to implement load balancing application servers than a database server cluster. With this in mind, more work done by PHP is preferred to more work in MySQL. I would not follow this approach though, because you are giving up much for only a small benefit.

In short, I recommend that you stick to plain SELECT queries to get what you need. Run your application and database on separate servers, and use the more powerful server for your DB server.

PS. Facebook write a pre-compiler for PHP to make their code run faster. In my opinion, PHP is not a very fast language and you can get better results from Python or Node.js.

Stackoverflow use ASP.NET MVC with MS SQL Server. They have a single big powerful server for the database and apparently rather use DB queries where they can. They also use load balanced application servers that are separate from their DB server.

neelsg
  • 4,802
  • 5
  • 34
  • 58
  • 1
    I'd like to add that MySQL's query cache will probably be a huge time saver. It sounds like the size of the cache may need to be larger than normal for this case since there may be a lot of different queries. Any interpreted language will be slower than a compiled language and I'm not familiar with python or node.js, but I think those are also interpreted. If performance is a real concern, you should test your PHP scripts with a profiling extension. – G-Nugget Feb 15 '13 at 14:46
  • Thanks for your advises, I'm gonna look at this solution of load balancing and separate the DB from the application. – Valky Feb 15 '13 at 14:48
  • And what do you think about PHP Accelerator, TurckmmCache, Zend Optimizer... and other modules to improve PHP interpretation ? – Valky Feb 16 '13 at 15:28
  • @G-Nugget -- I agree. Python and node are also interpreted, but performance wise they seem to be faster than PHP. They can obviously never be as fast as something written in C (Although I wouldn't want to write and entire website in C). Also, performance depend on how they are used etc. That is why I said "In my opinion". – neelsg Feb 16 '13 at 19:39
  • @Valky --Unfortunately, I haven't used PHP Accelerator, TurckmmCache or Zend Optimizer, so I can't comment. In my answer, I only commented on PHP as a PS because it is much less likely to be your problem than the actual database queries / file IO when using a cache. I would only start looking to optimize there if you have already sorted the DB out. – neelsg Feb 16 '13 at 19:44
  • Ok, thanks. BTW it seems that those accelerators are old and there's no support anymore. I've already delete my cache system and only use SQL queries with `query cache` activated and preformances seems to be enhanced. – Valky Feb 18 '13 at 18:01