-5

Let's say we need to go through a 128-element array of uint8 and compare neighbour elements and put the result to another array. The code below is the most simple and readable way to solve this problem.

for (i = 1; i < 128; i++)
  if (arr[i] < arr[i-1] + 64) //don't care about overflow
    arr2[i] = 1;

It looks like this code 1) will not use branch table. And as far as I know, a cpu doesn't read just 1 byte, it actually reads 8 bytes (assuming a 64bit machine), and that 2) makes cpu do some extra work.

So here comes another approach. Read 2 (or 4 or 8) bytes at a time and create an extremely huge switch (2^16, 2^32 or 2^64 cases respectively), which has every possible combination of bytes in our array. Does this make any sense?

For this discussion let's assume the following: 1) Our main priority is speed 2) Next is RAM consumption. We don't care about the size of the executable (unless they somehow affect speed or RAM)

Alexander Dmitriev
  • 2,483
  • 1
  • 15
  • 20
  • 3
    Write natural code and use compiler optimization switches – M.M Mar 02 '17 at 08:27
  • 1
    It probably doesn't make much sens. – Jabberwocky Mar 02 '17 at 08:27
  • 2
    How do you know that loop is the bottleneck? – 2501 Mar 02 '17 at 08:27
  • 2
    BTW: code size _can_ affect performance because you may lose some benefits of the CPU cache. – Jabberwocky Mar 02 '17 at 08:31
  • @2501 it's not a part of a programm, it's theoretical question. – Alexander Dmitriev Mar 02 '17 at 08:40
  • 2
    `arr[i-1] + 64 //don't care about overflow` What overflow? There is no overflow here. An `int` can easily fit 64 + any uint8 value. – Lundin Mar 02 '17 at 08:41
  • Theoretically we cannot tell, because practically it depends on what you are going to do with the info, and there might be shortcuts. For example, in a chess playing program calculating a list of all possible moves up front is the worst strategy, because once you find that the opponent can capture your queen you really don't care about what else he can do and some of the work is wasted. In your example, after finding (or not finding) a certain number of differences you might give up and never have to do all 128 comparisons. – Bo Persson Mar 02 '17 at 11:26
  • Assuming the rest of `arr2` is 0, I'd write the loop body as just `arr2[i] = arr[i] < arr[i - 1] + 64;` and drop the `if`. Then of course remove any other code that 0-initializes `arr2`. – unwind Mar 02 '17 at 12:17
  • When the CPU reads 8 bytes to get 1 it doesn't throw the rest away. They are in cache so it saves 7 reads. – stark Mar 02 '17 at 12:36

1 Answers1

1

You should know that switches are actually very slow as branch would be likely mispredicted. What makes switch fast is jump table:

switch (i) {
   case 0: ...
   case 1: ...
}

gets translated into this:

labels = {&case0, &case1}
goto labels[i]

However, you do not need this either as your only writing memory cell and you can write a "jump table", or more specifically pre-computed matrix of answers yourself:

for (i = 1; i < 128; i++)
   arr2[i] = answers[arr[i]][arr[i-1]];

uint8 have only 256 possible values which gives us 64k of RAM required for that matrix.

myaut
  • 11,174
  • 2
  • 30
  • 62
  • [This](http://stackoverflow.com/questions/97987/advantage-of-switch-over-if-else-statement), [this](http://stackoverflow.com/questions/427760/when-to-use-if-else-if-else-over-switch-statments-and-vice-versa), [this](http://stackoverflow.com/questions/449273/why-the-switch-statement-and-not-if-else), [this](http://stackoverflow.com/questions/6805026/is-switch-faster-than-if). Doesn't look like switches are very slow. But your solution is quite interesting. – Alexander Dmitriev Mar 02 '17 at 09:04
  • @AlexanderDmitriev: Switches based on branches are slow. Jump tables are not. Your answers are all reference a jump table solution. – myaut Mar 02 '17 at 10:29