How to detect symmetries in 4 integer variables efficiently?

Question

I want to find symmetries in 4 integer variables i,j,k and l . The symmetries are:

all four numbers are equal: XXXX,
three numbers are equal: XXXY,XXYX,XYXX,YXXX
two pairs of equal numbers: XXYY,XYXY,XYYX,...
one pair of equal numbers and two different numbers: XXYZ,XYXZ,XYZX,...
all numbers are different.

All variables run within a certain non continuous range. I use nested if else statements. The first if checks for inequality of all variables. If not, then I have case 1. The next if checks if there are any equal pairs. If not, then case 5. The next if checks for three equal numbers. If true, then case 2. Otherwise, the last if checks for two pairs of equal numbers. If true, then case 3, otherwise case 4.

  if(!(i==j && j==k && k==l)){
    if(i==j || i==k || i==l || j==k || j==l || k==l){
     if((i==j && j==k) || (i==j && j==l) || (i==k && k==l) || (j==k && k==l)){            ...//do something
     }else{
    if((i==j && k==l) || (i==k && j==l) || (i==l && j==k)){ 
...//do something
    }else{
     ...//do something
    }           
  }
     }else{
     ...//do something  
     } 
 }else{
  ...//do something
 }

Is there better way do do this? I mean better in the sense of better performance, because I have to do this test millions of times.

I'd start _sorting_ the 4 values first. Then it's almost trivial. — Jabberwocky, Mar 13 '17 at 09:58
It will depend on the range and distribution of numbers. For example, to take an extreme case, suppose the 4 numbers are random 32-bit integers. In that case, they will almost always be all different, so you would optimize to test for that case first and fall through to the less common cases. At the opposite end of the spectrum, all 4 numbers being equal might be the most common case. In that case your current approach would be fastest. — samgak, Mar 13 '17 at 10:00
Copy the values into an array and sort them. Rather than focusing on i, j, k, l, focus on the values in this array, where index `[0]` is the lowest. Anyway, this question is too broad to be answered even as algorithm/pseudo code, because it is not 1 question but 5 different. Also, optimization depends heavily on if it is always 4 items or if the number of items should be variable. — Lundin, Mar 13 '17 at 10:18
If you do these millions of tests inside a (small) loop body, try to optimize the code size, for example using samgak's or Ari's solution below, so it fits good into the I-caches. Otherwise, if it is an external function anyway, you might use a cascade of if-else-branches `if (i==j) { if (j==k) { if (k==l) { ... } else { ... } } else { if (k == l) {... } else { ... } ....` to minimize the number of comparisons. — Ctx, Mar 13 '17 at 10:52
If you do go with sorting, use a sorting network. But I think you'll quickly see Ari's answer is better. — R.. GitHub STOP HELPING ICE, Mar 13 '17 at 11:52
I figured out that I can sort the numbers with almost no cost. I can make sure that they are created in a sorted manner. The most unlikely case is that they are all equal. — Kulibo, Mar 13 '17 at 12:03

score 9 · Accepted Answer · answered Mar 13 '17 at 10:33

9

Similar idea than samgak, but without the need of external table. Just count the sum of all matches

int count = (i==j) + (i==k) + (i==l) + (j==k) + (j==l) + (k==l);

and do switch with following choices

switch (count){
case 0: //All differenct
case 1: //One same
case 2: //Two different pairs
case 3: //Three same
case 6: //All are same
}

Again, as already mentioned, your current code might be faster in some cases. Especially if the most common case is the one where all the elements are equal.

answered Mar 13 '17 at 10:33

Ari Hietanen

1,749
13
15

Not sure if this makes any practical difference, but in theory you could replace the least likely `case` with `default` and reduce number of comparisons by one in the `switch` statement. – user694733 Mar 13 '17 at 10:51
Nice solution. This works great for a small number of variables. – 2501 Mar 13 '17 at 10:53
This is clearly quadratic, Sorting would gives us O(n log n). A hash table would be O(n), with space O(n). I wonder if there is a O(n) solution that uses constant space. – 2501 Mar 13 '17 at 11:11
I tried this and it behaves strange. Values for i,j,k,l: 0000 0000 case 6 count:6 Values for i,j,k,l: 0001 0001 case 3 count:3 0001 case 6 count:3 Values for i,j,k,l: 0011 0011 case 2 count:2 0011 case 3 count:2 0011 case 6 count:2 Values for i,j,k,l: 0111 0111 case 3 count:3 0111 case 6 count:3 Values for i,j,k,l: 1111 1111 case 6 count:6. Sorry for this mess. I can't do line breaks. It seems that different cases are called for the same value of count. – Kulibo Mar 13 '17 at 11:47
@2501 This is clearly O(1), since the number of variables is constant. – Ctx Mar 13 '17 at 11:54
1

@Kulibo You need to write `break` at the end of each case otherwise it execute them all after first true value. – Ari Hietanen Mar 13 '17 at 11:55
OMG!. I don't use switch that much and totally forgot about it. thx. – Kulibo Mar 13 '17 at 12:06
@2501 How can I do the test with O(n log n)? I can create the values for i,j,k,l in a sorted manner. What is the most efficient test method then? – Kulibo Mar 13 '17 at 12:42
1

@Kulibo I don't think you can get much faster than this, because in this example, to calculate `count` multiple instructions will be executed in a single clock cycle and internally `switch` is highly optimized, using different algorithms depending on the number of cases. Also, to sort the values, a O(n*n) bubblesort would probably be the fastest for 4 items - so I wouldn't care much about big-O in this case. – Danny_ds Mar 13 '17 at 12:52
.. Although for really smal sets there are [better ways](http://stackoverflow.com/questions/2748749/fast-algorithm-implementation-to-sort-very-small-list) to sort, but regular O(n log n) might be slower compared to bubblesort O(n*n). – Danny_ds Mar 13 '17 at 13:05
@Kulibo I was talking about a generalized algorithm. This answer is the fastest solution for four variables. – 2501 Mar 13 '17 at 13:17

samgak · Answer 2 · 2017-03-13T10:31:19.610

If you can afford a small (64 byte) lookup table, you can test each pair of values and set a bit for each comparison in a number that you use as an index into your table, e.g:

int classifySymmetries(int i, int j, int k, int l)
{
     return table[(i == j) |
                  ((i == k) << 1) |
                  ((i == l) << 2) |
                  ((j == k) << 3) |
                  ((j == l) << 4) |
                  ((k == l) << 5)];
}

Then do a switch on the return value. You can use your existing code to generate the table, by substituting a bit test for each comparison, or generating dummy i j k l values that satisfy each bit pattern from 0 to 63.

This approach requires a constant 6 comparisons. Bear in mind that sorting 4 values requires between 4 and 5 comparisons (there are 4! = 24 possible orderings, and each comparison yields 1 bit of information). But then you have to do tests based on the sorted values on top of that.

Whether using a lookup table beats your current approach will depend on the distribution of values and other factors like memory access times, you should profile to confirm.

Do you mean `(i == j) >> 0, (i == k) >> 1, (i == l) >> 2`...? I cannot make sense of `(i == l) & 4`, which is always 0, non? — Ctx, Mar 13 '17 at 10:21

Jason Lang · Answer 3 · 2017-03-13T10:29:18.627

A better way is to use a map:

#include <iostream>
#include <map>
using namespace std;


int main()
{
    int i, j, k, l;
    cin >> i >> j >> k >> l;

    std::map<int, int> count;

    int outcomes[5] = { 0, 0, 0, 0, 0 };

    // Store the values in the map
    count[i]++;
    count[j]++;
    count[k]++;
    count[l]++;

    // tally types of outcome according to the map
    for(typename std::map<int, int>::iterator iter = count.begin(); iter != count.end(); ++iter)
    {
        outcomes[iter->second] ++;
    }

    // print out "1 of a kind" count, up to "4 of a kind"
    // this is just for visualization
    for (int i = 1; i <= 4; ++i)
    {
        cout << i << " of a kind = " << outcomes[i] << endl;
    }

    // your bit here, it checks on just the "outcomes" array
    if(outcomes[4] > 0) // 4 of a kind
    {
    }
    else if(outcomes[3] > 0) // 3 of a kind
    {
    }
    else if(outcomes[2] > 1) // two pair
    {
    }
    else if(outcomes[2] > 0) // one pair
    {
    }
    else // singles only
    {
    }

    cin.ignore();
    cin.get();

    return 0;
}

This approach would be far more extensible too, if you wanted to extend it beyond 4 choices.

This solution has probably too much overhead and might therefore not be very efficient for as few as 4 values. — Jabberwocky, Mar 13 '17 at 10:34

How to detect symmetries in 4 integer variables efficiently?

3 Answers3