How to compute the height profile of a Tetris stack most efficiently?

Question

Problem Statement

We're given an array of Integers stack of length height. The width tells us that at most the width-lowest bits in each entry of xs are set.

Compute an array profile of length width such that profile[i] == max_i with: max_i is maximal with stack[max_i] having the i-th bit set.

How can I achieve this in a more efficient way than below?

Current solution

Currently I go over the columns and check each bit separately.

The following shows my current implementation in Scala. But feel free to give answers in other languages (Java, C, C++), as I am mainly interested in the algorithmic part (optimized for current CPUs).

Scala code:

def tetrisProfile(stack: Array[Int]): Array[Int] = {
  var col = 0
  val profile = new Array[Int](width)
  while(col < width){
    var row = 0
    var max = 0
    while(row < height){
      if(((stack(row) >> col) & 1) == 1)
        max = row + 1
      row += 1
    }
    profile(col) = max
    col += 1
  }
  return profile
}

Typical values

array size height is 22
width width is 10

Gist with benchmark code

Find the code here.

Current results:

original:    2.070s,        2.044s,        1.973s,        1.973s,        1.973s
maxihatop:   0.492s,        0.483s,        0.490s,        0.490s,        0.489s

This would be easier if the board was transposed. But not easier enough that transposing it just for this would be worth it. — harold, Mar 27 '15 at 21:12
@Mikeologist I'm running on the JVM. The language doesn't matter, so Java code is ok, too. — ziggystar, Mar 27 '15 at 22:23
Please, do not suggest to move this question to codereview. I tried to and they don't want it. — ziggystar, Mar 30 '15 at 09:15
For the record, the question was rejected from Code Review only because of the author's insistence on simultaneously tagging it as [tag:scala], [tag:java], and [tag:c]. That makes it such that Scala is merely being treated as pseudocode, which would be off-topic for Code Review. It would be perfectly acceptable as a [tag:scala] question. — 200_success, Mar 30 '15 at 09:19
And as I consider the scala code as pseudo-code, and I welcome answers in similar languages (C,C++,Java,Scala), I have migrated it back to SO. — ziggystar, Mar 30 '15 at 09:24

olegarch · Accepted Answer · 2015-03-30T13:59:17.390

1

I wrote my solution on C. I hope, you will able port algorithm to Java or Scala.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define WIDTH  10
#define HEIGHT 22

// Convert (1 << n) to n for n == 0-10
static char bit2ndx[11] = {-1, 0, 1, 8, 2, 4, 9, 7, 3, 6, 5};
int *tetrisProfile(int *input) {
  int row;
  // allocate memory and set everything to -1 - default rc value,
  // returned if no any result for this column
  int *rc = (int *)malloc(WIDTH * sizeof(int));
  memset(rc, ~0, WIDTH * sizeof(int));
  // create bitset for columns for check
  int testbits = (1 << WIDTH) - 1;
  // Iterate rows from up to bottom, and while exist columns for check
  for(row = HEIGHT - 1; row >= 0 && testbits != 0; row--) {
    int rowtestbits = testbits & input[row];
    while(rowtestbits != 0) {
      // extract lowest bit_1 from bitset rowtestbits
      int curbit = rowtestbits & -rowtestbits;
      rc[bit2ndx[curbit % 11]] = row;
      rowtestbits ^= curbit;
      testbits    ^= curbit;
    }
  }
  return rc;
}

int stack[HEIGHT] = {0x01, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80, 0x100, 0x200,
                       0,   0,   0,   0,    0,    0,    0,    0,     0,     0,
                       0,   0};


main(int argc, char **argv) {
  int i;
  int *ret = tetrisProfile(stack);
  for(i = 0; i < WIDTH; i++)
      printf("ret[%02d]=%d\n", i, ret[i]);
  free(ret);
}

edited Mar 30 '15 at 13:59

answered Mar 30 '15 at 11:28

olegarch

3,670
1
20
19

So you have exchanged the loops, and you stop once you have hit something in each column. In addition it might be possible to extract multiple heights per inner loop? – ziggystar Mar 30 '15 at 12:33
I exchanged the loops. and inner loop iterates only columns: a) presents in this row; b) value is not found in row above. Thus, body of inner loop will be execute less or equal to WIDTH times, no more. – olegarch Mar 30 '15 at 13:56
Your solution is 4 times faster than the old one, which is good. Can you explain (or give a link to an explanation of) the `curbit % 11` magic? – ziggystar Mar 30 '15 at 20:11
There was comment above. I will try explain by another words: variable "curbit" contains single bit_1 in the corresponding column. Need turn bit into it's index position. Most effifient solution - to use assembly BSF command. But, I assume, your language disallow to use assembly. So, this is just turn around - table contains just precomputed indexes of corresponding bits. I.E, if explain by math, then: N == bit2ndx[2**N]; – olegarch Mar 30 '15 at 20:41
OK. Also, if you next question "Why 11, not 5 or 10" - then answer: 2 is primitive root modulo 11. So, you needed select divider, which a)more than WIDTH b) two must be primitive root of divider. Detail about primitive roots: http://en.wikipedia.org/wiki/Primitive_root_modulo_n – olegarch Mar 30 '15 at 21:05

How to compute the height profile of a Tetris stack most efficiently?

Problem Statement

Current solution

Typical values

Gist with benchmark code

1 Answers1

Linked