5

I know my question see a bit vague, but I feel like managing the countries/addresses is actually something pretty common, so I would like to get some advices on my actual setup.

I have a database with a "country" column, previously it was a medium int type, acting as Foreign Key to another table with the actual information about the countries (id, name and ISO3166-1 alpha2 code mainly).

After some testing and benchmarking I ended up having all the countries information in a php file array instead, including/requiring it when needed, and it was like one or two orders of magnitude faster than querying the database. (they are 278 countries).

So apparently it's a better approach but I feel like there's something wrong because people normally tend to do this kind of stuff reading from a table instead from a file, but I can't figure out what would it be, is it easier to maintain or something like that?

Also, I was thinking of having the 2 letters ISO code as the key instead of a numeric id, it would be more human-readable and they are unique anyway. I see no noticeable performance loss in the 400.000 rows table, will this end up being an error if my database grows ?

aleation
  • 4,796
  • 1
  • 21
  • 35
  • 1
    How did you query the database with country table - with one joining select, or with two separate selects? Did you use index on foreign key column? – mas.morozov Oct 21 '13 at 10:16
  • 1
    And did you set up a primary key for the country table? Did you require PHP country file in each of your 1000 loops, or required it once, then made 1000 short loops during the same PHP processor invocation? – mas.morozov Oct 21 '13 at 10:23
  • Yes the columns are indexed, and Actually it wasn't neither a join nor 2 selects. I used it just as condition in the WHERE clause like this: ` SELECT * FROM table where country = 253 ` and even so it's faster requiring the file. And I required the file once and then looped. – aleation Oct 21 '13 at 10:26
  • "I required the file once and then looped" - and this is the reason. Reading data only once, than using it 1000 times is called *caching* and it increases performance drastically. PHP file is not the only way - you can make one SQL query, save results in PHP array, then use it 1000 times with near the same results. – mas.morozov Oct 21 '13 at 10:30
  • Having no MySQL table for countries leaves you without opportunity to join on it, which makes your whole database inept. – mas.morozov Oct 21 '13 at 10:34
  • And you should take notice, that your PHP application is *restarted* with every HTTP query, so it *rereads* countries file. And it is probably not going to use it 1000 times in *each invocation*, so your performance test is ill. You would better perform full stack testing of your application for performance monitoring. – mas.morozov Oct 21 '13 at 10:41
  • You are right, I will take your advice and do the full stack testing. But anyway I've been actually testing it right now doing single executions, and the performance is still better, and more taking into account that similar mysql queries are cached (0.0003s vs 0.014s) So the wrong thing of doing it this way would be more oriented to the joins as you mentioned, and avoiding bugs and duplicates as @Neville K pointed. – aleation Oct 21 '13 at 10:53
  • If your querying logic is right, you will never make thousands of 'small' queries (or array lookups), but use joins instead. If you parse (precache) the whole array of 278 counties and then use 1 or 2 of them - you are just wasting time on precaching (and do not account it!). RDBMS indexes helps you *not to read* what you do not need at all (at their own cost, of course). The problem is that proportion of precaching/access time is dependant only on usage context (average number of accesses on one precache) And the nutshell of fullstack testing is measuring time statistics in real usage context – mas.morozov Oct 21 '13 at 11:05

1 Answers1

2

In general, you want to keep things together that change together. So, if your main data lives in a SQL database, keeping the country data in the database helps avoid craziness like someone changing the PHP lookup array without realizing there's a bunch of data in a SQL database.

It also helps avoid duplication - if you build a second application for your system (e.g. an administration system), you don't end up with 2 copies of the country lookup PHP files. Again, duplication creates the opportunity for bugs, with a developer changing the database and one of the lookup files, but not the other.

All of this is rather defensive - but applications tend to evolve in unexpected ways, and avoiding bugs is usually a good idea.

In my experience, using a join between two tables has almost no measurable performance impact on a well-tuned system with the numbers you mention - did you optimize the SQL before moving the lookup to PHP?

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • yes the table was optimized before testing. And after @mas.morozov pointed out my huge mistake I did some more testings, the performance is still better reading from a file, but I got your point, if the project expands it is easier to maintain the countries from the database, while the performance gain wouldn't be almost noticeable anyway. So I think I will end up keeping the countries in the database, but with the ISO CODE as foreign keys instead of the numeric ids. – aleation Oct 21 '13 at 10:58