0

I currently have a 2 Dimensional array that is filled as so:

$spreadRaw = array_fill(1, 56, array_fill(1, 36, 99));

so it is 56 rows with 36 'columns'

However the code could be rewritten to be:

$spreadRaw = array_fill(1, 36, array_fill(1, 56, 99));

so a 36 row 56 column.

I imagine that 36 rows would be at least slightly faster, but how much faster or not?? AND WHY?

And also lighter, or not??

OK i appreciate that these figures are small, what if we are dealing with a much larger situation, I'm interested in the workings under the hood!

EnglishAdam
  • 1,380
  • 1
  • 19
  • 42
  • 1
    Why do you think one would be faster than the other, it's still the same number of zvals being created either way – Mark Baker Oct 12 '13 at 10:09
  • 56 x 36 isn't very much by any standards. So unless the third dimension, time, is relevant - you're operating on millions of operations per second - I'd say the difference would be negligible. – LSerni Oct 12 '13 at 10:09
  • @mark Because php multi dimensional arrays aren't really multidimensional, i imagine here they are 36 or 56 variables, dependent on row count, could be wrong though. – EnglishAdam Oct 12 '13 at 10:10
  • 1
    make your own benchmark… with microtime(1). – bwoebi Oct 12 '13 at 10:12
  • Your perception on the different number of allocations is fine, but then, if muddying the logic (switching row and column), you could also use a single dimensional array. It is a microoptimization. – Joop Eggen Oct 12 '13 at 10:13
  • "I imagine" is a strange statement when it comes to computation. Either you know it, or you ignore it, or you have to find out. So make a testcase with a bigger amount of entries, say 10000x100000 and have a try. Measure the time and memory usage in both cases. – arkascha Oct 12 '13 at 10:14
  • @Iserni, I changed the question to include a theoretical aspect for a much larger situation – EnglishAdam Oct 12 '13 at 10:14
  • @bwoebi that wont tell me the why... – EnglishAdam Oct 12 '13 at 10:17
  • For it to make any difference, it would have to be true that accessing an entry in an array is slower the more entires there are in the array. First of all, this may be true, but the difference will not be significant in any way. It's not that PHP arrays start to behave slowly after x entries. Secondly, the difference will be even less pronounced if the choice is between two such very small numbers. In other words: it doesn't matter! – deceze Oct 12 '13 at 10:18
  • I'd bet it's just a smaller number of allocations that have to be done on an 36x56 array… but what are the real results? – bwoebi Oct 12 '13 at 10:19
  • @bwoebi You're still allocating exactly the same number of values overall...! – deceze Oct 12 '13 at 10:20
  • @deceze, if php simply is allocating space for each valuepair then there will be no difference, if you are sure, expand a little and put that as the answer and I-ll accept it! – EnglishAdam Oct 12 '13 at 10:23
  • @deceze no, but there are less allocations: don't forget that this is copy on write… there are 20 allocations in total less^^ (yeah, nitpicking :-P) – bwoebi Oct 12 '13 at 10:27
  • `36 * 56` and `56 * 36` calculated as the same result when I was at school, I don't think so much has changed since then... then you add the overhead of the number of arrays, 20 subarrays difference is almost nothing.... a minor memory allocation – Mark Baker Oct 12 '13 at 10:28
  • As regards speed, direct access to any given element in a 2d array is independent of the number of entries in the arrays because they're hashmaps and PHP has to calculate 2 hashes to retrieve any given entry... looped access is 36*56 or 56*36 to access the whole grid regardless of order of the dimensions – Mark Baker Oct 12 '13 at 10:31
  • @MarkBaker there is not the difference. But in how many zvals need to be duplicated upon insertion. – bwoebi Oct 12 '13 at 10:32
  • @MarkBaker just add a counter to _emalloc&_erealloc functions (in TSRM) to count how often they were called and recompile php… – bwoebi Oct 12 '13 at 10:34
  • 1
    If you're working with 2d arrays of sizes that are already known in advance, you'd probably get much better performance using SPLFixedArray anyway – Mark Baker Oct 12 '13 at 10:38

1 Answers1

1
array_fill(1, 56, array_fill(1, 36, 99));

will lead to 57 calls, while the other way round it will lead to 37 calls. The argument of these calls is a bit more massive in the latter case, so things tend to even out:

56 x 36:   1350,6 Kcalls/s

36 x 56:   1353,0 Kcalls/s

<?php
        $a = microtime(true);
        $k = 0;
        for(;;) {
                for ($i = 0; $i < 1000; $i++) {
                        $spreadRaw = array_fill(1, 56, array_fill(1, 36, 99));
                }
                $k++;
                if ((($b = microtime(true)) - $a) > 10) {
                        break;
                }
        }
        $t = $b-$a;
        print "$k Kcalls in $t seconds";
?>

with a difference of ~.178%. Or about one hour per month of continuous operation.

The memory footprint is a bit more difficult to measure, but even if the overhead of every row were in the order of the kilobyte (which surely isn't), we'd be talking of some twenty kilobytes at the outmost.

You might want to experiment with some code coverage or profiling tools to discover where your bottlenecks really are (e.g. http://www.xdebug.org/docs/profiler ).

This answer might also be useful to you.

Update

A larger test case, switching between 1000 and 10000 rows or columns, yields

1000 rows, 10000 columns = 5773 calls/s
10000 columns, 1000 rows = 5652 calls/s

with a much larger difference of 2.14%. I haven't been able to get conclusive data about memory occupation; at this point I'm convinced that there is no difference worth speaking of.

This is only creation time though: the most important thing is to measure access time during operation, and this depends on the algorithm. If access were to be done by rows, for example, surely allocating values in rows would be more efficient; if by columns, the reverse. After all, it is to be expected that access operations will outnumber creations by several orders of magnitude.

Community
  • 1
  • 1
LSerni
  • 55,617
  • 10
  • 65
  • 107
  • Access time is O(n) with n the depth (not the sizes…) if there are just integer keys. – bwoebi Oct 12 '13 at 10:35
  • It shouldn't matter whether its an enumerated or an associative array as the keys => value lookup is a hashmap anyway; that's where SPLFixedArray has a real advantage – Mark Baker Oct 12 '13 at 10:39
  • Thanks for your workings and explanations as to watch for and test for. Thanks! Accepted. – EnglishAdam Oct 12 '13 at 10:51