0

I use a variation of a 5-cross median filter on image data on a small embedded system, i.e.

    x
  x x x
    x

The algorithm is really simple, read 5 values, get the highest 2 and do some calculations on those and write back the result.

The 5 input values are all in the range of 0-20. The calculated value are also in the 0-20 range!

I'm trying to figure out if I can use a lookup-table to speed things up but I have failed to generate a key that I can use.

As an example, considering if the input was in the range of 0-5, one way of generating a unique key would be to take the binary representation and just concatenate the numbers, i.e.

101 101 101 101 101

key = x[0] | x[1] << 3 | x[2] << 6 | x[3] << 9 | x[4] << 12

But that LUT is huge, ~23k items.

Since [5,0,0,0,5] is the same as [5,0,5,0,0] one simplification could be to use 2 LUTs,

LUT1 = [0, 1, 6, 31, 156, 781]

Where each items is 1 more than the maximum sum of 5 of the previous elements

Then the key could be calculated as (using python syntax)

key = sum([LUT1[x[0]], LUT1[x[1]], LUT1[x[2]],
           LUT1[x[3]], LUT1[x[4]a]])

But again, this approach doesn't scale up to the range 0-20 for each element.

Using a sorting network as described in Fastest sort of fixed length 6 int array doesn't improve the performance; I'm only interested in the 2 highest values.

So, is it possible to create a unique key from five positive integers in the range of 0-20 that can be used as index into a LUT?

Community
  • 1
  • 1
Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130
  • Since you only use the two largest values (and throw away the other three), perhaps an optimized implementation of a `max2of5(...)` function and a 21*21=441 entry lookup table would be sufficient? – twalberg Feb 19 '15 at 21:20
  • you mean finding the max of the five values? [5,0,0,0,0] and [5,0,0,0,5] both give the max as 5 but the end result is different. The calculation part is not complex but I'm trying to get rid of the "finding the 2 highest values"-part. Or did it misinterpreted you? – Fredrik Pihl Feb 19 '15 at 21:26
  • Perhaps I should have been a bit clearer on the function - I was meaning a function that would give e.g. `max2of5([5,0,0,0,0]) -> [5,0]` and `max2of5([5,0,0,0,5])-> [5,5]`. In other words it would return the two largest entries, not just one... – twalberg Feb 19 '15 at 21:29
  • ah, of course. I have one of those that's fairly ok. The filter runs on HD-resolution video in real-time so I really need this one to be as fast as possible, hence trying to move over to a LUT solution since the input and output are so well defined. – Fredrik Pihl Feb 19 '15 at 21:34

1 Answers1

1

Disclaimer: this is not a general solution.

I had a similar problem and solvet this way:

1) set a bit in a bitmask for each of the 5 values (at least 21 bits, so a 32 bit variable is needed).

2) If the bit is already set (=duplicated value), an index variable (initialized to -1) is set to the value, if it's less than the value.

3) Now the part that makes "not general" the solution, because its performance depends on the avalibility of a very fast bit-scanning instruction (e.g. x86 has one): Find the higest bit set. Its index is the first value of the pair.

4) If it's equal to the "duplicated value" variable, then the second value too is this one.

5) Else, teh second value of the pair isthe second higest bit's index.

6) Now you have a pair of values between 0 and 20, a 21x21 lookup table is small enough in my opinion.

In rude code:

int dup = -1;
uint_32 m = 0;
uint_32 vm;
int i;

for (i = 0; i < 5; ++i) {
   vm = 1 << val[i];
   if ((m & vm) && (dup < val[i])) {
       dup = val[i];
   }
   else {
       m |= vm;
   }
}

// Now the part that needs a very fast "get higest bit" function

top[0] = get_higest_bit(m);
if (top[0] == dup) {
    top[1] = top[0];
}
else {
    m &= ~(1 << top[0]);
    top[1] = get_higest_bit(m);
}

Edit: Also, bit-oriented operators (<< etc...) must be fast enough. X86/x64 matches this requirements. Not guaranteed on different cpus. But at least the expression "1 << x" may be optimized in many ways.

Giuseppe Guerrini
  • 4,274
  • 17
  • 32