11

I was studying in depth about pointers as I don't think I have good knowledge about pointers and came across the following line on Wikipedia:

When dealing with arrays, the critical lookup operation typically involves a stage called address calculation which involves constructing a pointer to the desired data element in the array. If the data elements in the array have lengths that are divisible by powers of two, this arithmetic is usually much more efficient.

Why is this so?

The above line is written under the heading "Uses"

Mysticial
  • 464,885
  • 45
  • 335
  • 332
Pratik Singhal
  • 6,283
  • 10
  • 55
  • 97
  • That should probably have a citation needed flag on wikipedia – norlesh Jan 01 '14 at 10:43
  • 2
    Multiply isn't really that much slower than shifting nowadays. Unless that statement is referring to the `lea` instruction. But that only does very small powers of two. (at most 4 or 8 I think) – Mysticial Jan 01 '14 at 10:44
  • 1
    I agree that the difference negligible in state of the art processors. But, apart from those processors, there is a vast variety of computing device such as low end micro controllers that run programs. In most of these devices the difference between a shift and multiplication is significant. – Buddhima Gamlath Jan 01 '14 at 11:15
  • Possibly related but with the opposite conclusion - http://stackoverflow.com/questions/11413855/why-is-transposing-a-matrix-of-512x512-much-slower-than-transposing-a-matrix-of/11413856#11413856 – Leeor Jan 01 '14 at 12:00

4 Answers4

12

Multiply by 2n is done by shifting left. Modern processors can do shift in a single cycle (and in x86, for small shifts up to 8 or 16, built into the address calculation itself). A regular multiply operation takes 4-10 clock cycles on AMD64 machines, and most likely similar on modern Intel processors. There are also restrictions to how "close together" two consecutive multiply operations can be done.

Of course, if the size of the array is quite large, it may be more efficient to use multiply instruction and pack the data in more tightly (not using padding to expand the data to a power of 2 size), because of cache efficiency

Of course, modern compilers are clever, so if you need to multiply by X by 12, the compiler will generate (X << 3) + (X << 2), for example, which is faster than a single multiply operation.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • what do you mean with "pack the data in more tightly" ? – user2485710 Jan 01 '14 at 11:02
  • 3
    Ever since Core 2, integer multiply on Intel processors have been 3 cycle latency with throughput 1/cycle. Shift is 1 cycle latency and 2/cycle. So yes, shift is still faster. But not by much. – Mysticial Jan 01 '14 at 11:02
  • @user2485710: I have edited to explain more clearly "pack data in more tightly". Mysticial: Yes, I think the 1/cycle is more of an issue than the latency - particularly in two-dimensional arrays where both the innermost dimension and the element size are non-2^n. – Mats Petersson Jan 01 '14 at 11:07
  • @Mysticial not according to [this](https://gmplib.org/~tege/x86-timing.pdf), shl has 4 times more throughput and 5 times less latency than imul for 64-bit on core 2 – Esailija Jan 01 '14 at 16:13
  • @Esailija Then either [Agner Fog](http://www.agner.org/optimize/instruction_tables.pdf) is wrong or GMP is wrong. I got my numbers from Agner Fog's tables. – Mysticial Jan 01 '14 at 20:52
  • @Mysticial It's not wrong, it lists exactly the same differences. Maybe you are not looking at 64-bit operations? Look at IMUL for r64, r64 – Esailija Jan 02 '14 at 00:25
  • @Esailija You're right. I was looking at the 32-bit multiply. But it looks like even the 64-bit multiply went down in Nehalem. – Mysticial Jan 02 '14 at 01:50
3

The address calculation of i'th element involves base + size_of_element * i.

When the size of element is a power of 2, say size_of_element = 2^m, then this can be achieved with base + (i << m).

The shifting is much more efficient compared to the multiplication involved in the earlier calculation.

user207421
  • 305,947
  • 44
  • 307
  • 483
Buddhima Gamlath
  • 2,318
  • 1
  • 16
  • 26
  • 3
    @ps06756 no, that quote is about element size. – harold Jan 01 '14 at 10:49
  • @KillanDS 'lea' does the addition as well as the offset calculation. It isn't comparable to either the multiply or the shift. I tested multiply versus shift in the early 1980s and it was ten times as slow. Again, it is not a controversial statement. If you can produce a system on which it isn't less costly, please do so. – user207421 Jan 01 '14 at 11:03
  • @KillianDS I didn't say you shouldn't *consider* it, I said you shouldn't *compare* it. Apples and oranges. You're in no position to be accusing others of spreading half-truths. I further note that you kept asking for citations while producing precisely none of your own, even when asked. – user207421 Jan 01 '14 at 11:08
2

Because the offset can be computed with a left shift instead of multiplication, but I'd also say the remark is probably a decade or two out of date, given the amount of pipelining in CPUs.

user207421
  • 305,947
  • 44
  • 307
  • 483
2

When multiplying such as would have to happen to find the Nth member in the array, when your dealing with powers of two you can use shifting operations which are lest costly than a full blown multiplication operation on some systems.

norlesh
  • 1,749
  • 11
  • 23
  • 1
    I'm just explaining what is written on the wikipedia page as asked - its up to the original author to provide the references – norlesh Jan 01 '14 at 10:52
  • @KillianDS I disagree. It's hardly a controversial statement. There's a difference between 'less costly' and 'not notice the difference'. – user207421 Jan 01 '14 at 10:56
  • I have reworded the answer to say 'in some systems' ... how would you answer this question? – norlesh Jan 01 '14 at 10:56
  • 1
    Weasel words :) Just cite something, like this: http://instlatx64.atw.hu/ Shift is obviously consistently faster than `imul`. `lea` is at worst as bad as an `imul`, and you'd have to do more than that `imul`, so `lea` still wins. As for whether that actually makes a difference, that's something else. – harold Jan 01 '14 at 10:58
  • Can I just point out at this point that including any of this information in the answer would not have helped the person asking the question :) – norlesh Jan 01 '14 at 11:04