2

I found a similar question here: Performance: condition testing vs assignment

This question is not about optimization. It's about coding preferences.

Here is an example:

I have data that I have no control over. It's from a 3rd party in the form of rows from a db table. It's the result of a MSSQL SP. Being bloated, I'd like to reduce it's size before transmitting the data over the wire as JSON. I can make it about 80% smaller as most of the data is repetitive.

So I do something like so:

    $processed = array();
    foreach ($result as $row)
    {
        $id = $row['id'];
        $processed[$id]['title'] = $row['title'];
        $processed[$id]['data'] = $row['data'];
        $processed[$id]['stuff'] = $row['stuff'];
        /* many more assignments with different keys */

        $unique = array();
        $unique['cost'] = $row['cost'];
        /* a few more assignments with different keys */

        $processed[$id]['prices'][$row['date']] = $unique;
    }

I thought this might be quicker, but it looks slower (I timed it):

    $processed = array();
    $id = null;
    foreach ($result as $row)
    {
        if ($id != $row['id'])
        {
             $id = $row['id'];
             $processed[$id]['title'] = $row['title'];
             $processed[$id]['data'] = $row['data'];
             $processed[$id]['stuff'] = $row['stuff'];
             /* many more similar lines */
        }

        $unique = array();
        $unique['cost'] = $row['cost'];
        /* a few more similar lines */

        $processed[$id]['prices'][$row['date']] = $unique;
    }

Can anyone confirm that with PHP "if"s or conditionals are indeed more compute intensive that assignments? Thanks.

[My answer as an edit]

I did some stand alone tests (without and real data or other code overhead) on FastCGI PHP running with IIS:

function testif()
{
    $i = 0;
    while ($i < 100000000)
    {
        if (1 != 0)  /* do nothing */;
        $i++;
    }

    return "done";
}

1st run: 20.7496500015256748 sec.

2nd run: 20.8813898563381191 sec.

function testassign()
{
    $i = 0;
    while ($i < 100000000)
    {
        $x = "a 26 character long string";
        $i++;
    }

    return "done";
}

1st run: 21.0238358974455215 sec.

2nd run: 20.7978239059451699 sec.

Community
  • 1
  • 1
d-_-b
  • 6,555
  • 5
  • 40
  • 58
  • 13
    This looks to me like premature optimization. Your `if` vs. assignment performance is going to be a drop in the bucket compared to the database request and data flowing back and forth. Optimize the database and network, not an `if` statement. – mellamokb Feb 25 '11 at 04:58
  • 1
    No sure what the answer to the question (I assume it does not slow down), but couldn't you just easily time how long it takes you code to execute, then experiment with changes? – Neddy Feb 25 '11 at 05:23
  • 2
    @mellamokb: A drop in a bucket is fairly significant; a drop in the ocean however... – Russell Dias Feb 25 '11 at 05:47
  • @mellamokb, It's only premature if there are other optimizations that can be done. In fact, this is optimizing for network transfer. I'm not actually concerned with this implementation. I'm wondering if anyone actually knows the cpu time that would confirm my findings. – d-_-b Feb 25 '11 at 06:01
  • @Neddy, I did time it. Sorry for not being clearer. I added that to the question. – d-_-b Feb 25 '11 at 06:02
  • @sims: I guess I'm highly skeptical that you're getting the largest performance gain from deciding whether to use an assignment or an `if` statement. The gain must be in the difference of the logic, i.e., what's hidden in `/* many more similar lines */`, and has nothing to do with the difference between an `if` statement and an assignment. – mellamokb Feb 25 '11 at 06:07
  • 1
    Premature Optimization = EVIL – Alec Smart Feb 25 '11 at 06:45
  • @mellamokb, What's hidden in /* many more similar lines */ are exactly the same except the indices. There are really no other optimizations to be done that I can see. Can you point something out? I'd be interested in hearing anything you notice that could make it faster. Though, that's not really the point of this question. I was only wondering about a vs. b. It's a theoretical/best practice question. It's not specific to this code. I'm not trying to solve a problem. I'm wondering about which is theoretically faster. – d-_-b Feb 25 '11 at 06:45
  • @Smart Alec, how you go about parsing such data? What would your best practice be? – d-_-b Feb 25 '11 at 06:46
  • Not related to "performance: if vs assignment", but one way to make textual data much smaller is compressing it (gzip/deflate). You say that most data is repetitive - that means that it would have great compression ratio. Compressing can be enabled globally in server configuration, i.e., you don't have to change your script for that. – binaryLV Feb 25 '11 at 07:22

3 Answers3

4

Well, being compared to time, required to transfer this JSON data to the client such a difference would be indeed a drop in the ocean.
Heck, even JSON encoding alone will do such if-s and assignments in thousands while encoding your data! Doing tests to compare these matters IS what you are doing wrong.

This is extraordinary limited point of view that leads to such questions.
While there are zillions other "CPU cycles" involved, a difference in thousand will make no difference.

  • there are a web-server that handles your request
  • there are a php interpreter (which, by default, have to parse whole your code picking it character after character)
  • there is a database lookup, which have to handle gigabytes of data
  • there is a network latency.

So, to make an adequate comparison, one have to involve all these matter into their tests. And start worrying only if a real life test will show any difference. Otherwise it will be complete and entire waste of time.

Such kind of questions is one of evilest things in our poor PHP community.
There is nothing bad in concerning in performance. But there is nothing worse than such "what is faster" question just off one's head.

And "it's just a theoretical question!" is not an excuse. these questions never being theoretical, be honest to yourself. One, who REALLY interested in all nitty-gritties, going another way - dealing with sources, debuggers and profilers, not running silly "zillion iterations of nothing" tests.

One who really concerned in speed, does measurements first. Such a measurement is called "profiling", and have a goal in finding a bottleneck - a matter, that REALLY makes your application slower.

However, sometimes no sophisticated measurements required but just little thinking.
for example, if you have too much repetitive data - shy not to ask your database to return a smaller dataset first?

Your Common Sense
  • 156,878
  • 40
  • 214
  • 345
  • I don't think you shouldn't make accusations of lying. Why should you use echo instead of print when coding with PHP? Do you get my point? What if I'm not encoding to JSON? I have had to write other such loops before. I want to know the optimal way of doing that. – d-_-b Feb 26 '11 at 04:34
  • @sims yes, there is not a single reason to prefer echo over print. that's the point and it seems you're unable to get it :( The only hope is testing the way I told you earlier in the comments. You won't get any difference. Try it. – Your Common Sense Feb 26 '11 at 08:01
  • Echo is faster than print by a minuscule amount to be sure. For that reason alone, why not use echo unless you really need print? This is the nature of my question, which it seems you have failed to grasp. I understand your point before you wrote it. This is not university, and not everyone here is 12. So perhaps just answer the question, and try not to lecture. This is not about optimization. It's about coding preferences. If one is faster, even marginally, than another, I will write it that way. Sounds like a good idea, no? – d-_-b Feb 26 '11 at 08:38
  • @sims: No, because you shouldn't be thinking about things like that when writing your code. The point Col Shrapnel is trying to get across is that premature optimization is a clear anti-pattern, as far as "best practices" go. You shouldn't "always use echo unless you really need print" because then you have to think about if you really need print as you go along. Just use print until you've **proven** that it's **too slow** in your code, through careful profiling. Then, and only then, should you go about optimizing by replacing statements with alternative constructs. – Cody Gray - on strike Feb 26 '11 at 08:50
  • 2
    @sims of course no :) echo is shorter to type and it's the only reason to prefer echo over print. Look. You've lost several hours trying to realize "which is faster". You will never gain it back from that "optimization". That's the problem. Another problem is your question is endless one. Every language construct can be substituted with another. Really - there are thousands of such potential comparisons. You can spend whole your life finding such millisecond differences. With no gain ever. That's why only profiling should be a reason for such questions, the reason should be real, not imaginary – Your Common Sense Feb 26 '11 at 08:53
  • `If one is faster, even marginally, than another, I will write it that way.` - well the question is not that theoretical (as you tried to put it earlier), as you're making practical conclusions from it :) – Your Common Sense Feb 26 '11 at 08:57
  • When deciding on coding standards etc, I think there is a good reason to consider one function or construct over another based on such minor issues. If both are even, then there needs to be a reason to use one over the other. – d-_-b Feb 26 '11 at 09:08
  • @Cody, no you shouldn't be thinking about it when writing your code. But it's a good idea to *decide* on a coding standard *before* writing your code. – d-_-b Feb 26 '11 at 09:09
  • @sims did you happen to notice the latter paragraph in my answer? Are you certainly sure that there is no way to ask a database to return unique data already? – Your Common Sense Feb 26 '11 at 09:21
  • The data is from a 3rd party app. I'm not allowed to modify it according to the licensing. It's a crappy application. BTW, Yes, of course I read your entire answer several times, and I agree 100% with what you are saying in principal. But you didn't attempt to answer the question. You just decided to lecture me, acting like I have no idea what I'm doing. There is actually no problem I'm trying to solve. I'm just interested in the most efficient way to loop this. Perhaps I should have asked on "Code Golf" where people play such games. – d-_-b Feb 26 '11 at 09:31
  • So let me get this correct... premature optimization is evil. But I should write all of my code first, and then profile down to the level where I can tell if my 'print' statements would be better replaced with 'echo' statements? How would profiling tell me this without replacing all of the print statements with echoes and rerunning the profiler? The difference between echo and print may be moot, but not all of these types of questions are moot. It is not 'premature optimization' to prefer a good pattern over a bad/slow/faulty one. – James Alday Feb 07 '12 at 16:37
  • @JamesAlday you are taking profiler wrong. Profiling is not to tell you that print is slower than echo. It is to tell you that print is a certain bottleneck of the whole application. Hint: it is not. – Your Common Sense Feb 07 '12 at 17:25
  • This argument is, of course, moot when talking about echo vs print, I realize that. But I still stand by the idea that it is useful to talk about difference in pattern and function use in regards to which is faster or more memory efficient even if it is not a big enough difference to show up when profiling. If an operation takes 100ms to complete and I can shave it down to 80ms, over the course of millions of hits it's a worthwhile optimization to me. – James Alday Mar 01 '12 at 13:46
  • @JamesAlday it is not. Achilles will be a light year ahead by the time tortoise will achieve these 20ms. – Your Common Sense Mar 01 '12 at 14:35
  • @Col.Shrapnel So _all_ optimization/best practices are moot? There is no best practice for basic coding (ie, at a syntax/function-choice level) that will make any difference in overall execution time? That would mean the warning in the PHP docs for preg_match that says to use strpos if you don't need regex are also moot and I can use regex matches without any appreciable performance hit. I understand print vs echo is pedantic, but isn't what you're advocating the other extreme? All I'm trying to say is that there are some things you can do to make your code faster and they're worth discussing. – James Alday Mar 01 '12 at 17:35
  • @JamesAlday there are many rubbish in the manual. And there are many things our unsuspicious performance tester just have no idea of. Say, an opcode caching makes negligible some his achievements and html caching makes obsolete other. And so on. Performance tuning indeed essential thing. But definitely not on the level of nanoseconds (20ms is a too much value to be real one). Try to grow up. Learn profiling, not comparative testing out of nowhere. Learn concepts, not silly tests. Good luck. – Your Common Sense Mar 01 '12 at 17:47
  • @Col.Shrapnel If I grow up much more I'll be ready to retire, but I appreciate the insult. – James Alday Mar 01 '12 at 20:10
1

As I already wrote as comment to the first post:

Not related to "performance: if vs assignment", but one way to make textual data much smaller is compressing it (gzip/deflate). You say that most data is repetitive - that means that it would have great compression ratio. Compressing can be enabled globally in server configuration, i.e., you don't have to change your script for that.

Compressed "processed data" probably would be somewhat smaller than "full data", though I doubt it could be 80% smaller.


Now about the performance.

Code:

$time = microtime(true);
$data = array();
for ( $n = 0; $n < 25000; ++$n ) {
    $data[] = array('id' => $n, 'text' => 'foo bar', 'key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3');
    $data[] = array('id' => $n, 'text' => 'foo bar', 'key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3');
    $data[] = array('id' => $n, 'text' => 'foo bar', 'key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3');
    $data[] = array('id' => $n, 'text' => 'foo bar', 'key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3');
    $data[] = array('id' => $n, 'text' => 'foo bar', 'key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3');
}
printf("%.05f\n\n", microtime(true) - $time);

for ( $n = 0; $n < 10; ++$n ) {
    $time = microtime(true);
    $tmp = array();
    foreach ( $data as $row ) {
        $id = $row['id'];
        $tmp[$id]['text'] = $row['text'];
        $tmp[$id]['key1'] = $row['key1'];
        $tmp[$id]['key2'] = $row['key2'];
        $tmp[$id]['key3'] = $row['key3'];
    }
    printf("%.05f\n", microtime(true) - $time);
}
echo "\n";

for ( $n = 0; $n < 10; ++$n ) {
    $time = microtime(true);
    $tmp = array();
    $id = null;
    foreach ( $data as $row ) {
        if ( $row['id'] !== $id ) {
            $id = $row['id'];
            $tmp[$id]['text'] = $row['text'];
            $tmp[$id]['key1'] = $row['key1'];
            $tmp[$id]['key2'] = $row['key2'];
            $tmp[$id]['key3'] = $row['key3'];
        }
    }
    printf("%.05f\n", microtime(true) - $time);
}
echo "\n";

for ( $n = 0; $n < 10; ++$n ) {
    $time = microtime(true);
    $tmp = array();
    foreach ( $data as $row ) {
        if ( !isset($tmp[$row['id']]) ) {
            $id = $row['id'];
            $tmp[$id]['text'] = $row['text'];
            $tmp[$id]['key1'] = $row['key1'];
            $tmp[$id]['key2'] = $row['key2'];
            $tmp[$id]['key3'] = $row['key3'];
        }
    }
    printf("%.05f\n", microtime(true) - $time);
}
echo "\n";

Results:

0.26685; 0.32710; 0.30996; 0.31132; 0.31148; 0.31072; 0.31036; 0.31082; 0.30957; 0.30952; 
0.21155; 0.21114; 0.21132; 0.21119; 0.21042; 0.21128; 0.21176; 0.21075; 0.21139; 0.21703; 
0.21596; 0.21576; 0.21728; 0.21720; 0.21610; 0.21586; 0.21635; 0.22057; 0.21635; 0.21888; 

I'm not sure why, but first timing of first test is constantly smaller than other timings for the same test (0.26-0.27 vs 0.31-0.32). Other than that, it seems to me that it is worth checking if row already exists.

binaryLV
  • 9,002
  • 2
  • 40
  • 42
  • I didn't do that. Some weirdo obviously because s/he left no comments. It really is very much smaller. If you have a rows where only 20% of fields' data changes, it can get much smaller. And no I can't optimize that data. It's from a 3rd party. – d-_-b Feb 26 '11 at 04:31
0

I believe that conditionals are slower in any language. This is related to how the compiler and CPU interact with the code. The CPU looks at the opcode generated by the compiler and tries to pre-fetch future instructions into cache. If you are branching, then it might not be able to cache the next instruction. I think there's some rule that the code block that is most likely should be in the if part, and the case that comes up less often in the "else" block.

I did a quick google search and there was another related question/answer on StackOverflow awhile back: Effects of branch prediction on performance?

Community
  • 1
  • 1
Alexandru Petrescu
  • 3,449
  • 2
  • 23
  • 23
  • Thank you for answering the question rather than trying to explain why my question is evil. – d-_-b Feb 26 '11 at 04:28
  • Who down voted this answer? Someone really has issues. Can whoever down voted this at say why this answer is incorrect? – d-_-b Feb 26 '11 at 08:45