9

There are multiple related questions, but I'm looking for a solution specific to my case. There is an array of (usually) 14 integers. How can I quickly tell if each int appears exactly twice (i.e. there are 7 pairs)? The value range is from 1 to 35. The main aspect here is performance.

For reference, this is my current solution. It was written to resemble the spec as closely as possible and without performance in mind, so I'm certain is can be improved vastly:

var pairs = Array
    .GroupBy (x => x)
    .Where (x => x.Count () == 2)
    .Select (x => x.ToList ())
    .ToList ();
IsSevenPairs = pairs.Count == 7;

Using Linq is optional. I don't care how, as long as it's fast :)

Edit: There is the special case that an int appears 2n times with n > 1. In this case the check should fail, i.e. there should be 7 distinct pairs.

Edit: Result I tested Ani's and Jon's solutions with tiny modifications and found during multiple benchmark runs in the target app that Ani's has about twice Jon's throughput on my machine (some Core 2 Duo on Win7-64). Generating the array of ints already takes about as long as the respective checks, so I'm happy with the result. Thanks, all!

mafu
  • 31,798
  • 42
  • 154
  • 247
  • Is the number array well sorted? You should tell us if there is something special with the array, which may help to improve the solution. – Cheng Chen Nov 15 '10 at 15:54
  • I'm currently profiling the answers to decide who'll get +15. – mafu Nov 15 '10 at 15:57
  • @Danny The array is not sorted. I can't think of anything helpful apart from what I stated so far. – mafu Nov 15 '10 at 16:05

6 Answers6

10

Well, given your exact requirements, we can be a bit smarter. Something like this:

public bool CheckForPairs(int[] array)
{
    // Early out for odd arrays.
    // Using "& 1" is microscopically faster than "% 2" :)
    if ((array.Length & 1) == 1)
    {
        return false;
    }

    int[] counts = new int[32];
    int singleCounts = 0;
    foreach (int item in array)
    {
        int incrementedCount = ++counts[item];
        // TODO: Benchmark to see if a switch is actually the best approach here
        switch (incrementedCount)
        {
            case 1:
                singleCounts++;
                break;
            case 2:
                singleCounts--;
                break;
            case 3:
                return false;
            default:
                throw new InvalidOperationException("Shouldn't happen");
        }
    }
    return singleCounts == 0;
}

Basically this keeps track of how many unpaired values you still have, and has an "early out" if it ever finds three of a kind.

(I don't know offhand whether this will be faster or slower than Ani's approach of incrementing and then checking for unmatched pairs afterwards.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I'll have to +1 this also. Not at all as elegant to look at as the LINQ variant that is a easy to read one liner, but this should be somewhat faster just by the fact that you jump out as soon as you find three equal elements. – Øyvind Bråthen Nov 15 '10 at 15:29
  • +1: Brilliant stuff, didn't think of this; it avoids the subsequent pass over the `counts` array. – Ani Nov 15 '10 at 15:42
  • or, you could maintain `pairsCount` and at the end `return pairsCount*2 == array.Length;` Of course keep the early return on finding 3-of-a-kind. – Ben Voigt Nov 15 '10 at 15:55
  • 1
    The array '(usually)' has 14 elements. You could have an early get out if there's an odd number of elements `if(array.Length % 2 != 0) return false;` – Binary Worrier Nov 15 '10 at 15:59
  • @Binary Worrier: Good call, will add. – Jon Skeet Nov 15 '10 at 16:07
6

Clearly, LINQ won't provide the optimal solution here, although I would improve your current LINQ solution to:

// checks if sequence consists of items repeated exactly once
bool isSingleDupSeq = mySeq.GroupBy(num => num)
                           .All(group => group.Count() == 2);

// checks if every item comes with atleast 1 duplicate
bool isDupSeq = mySeq.GroupBy(num => num)
                     .All(group => group.Count() != 1);

For the specific case you mention (0 - 31), here's a faster, array-based solution. It doesn't scale very well when the range of possible numbers is large (use a hashing solution in this case).

// elements inited to zero because default(int) == 0
var timesSeenByNum = new int[32];

foreach (int num in myArray)
{
    if (++timesSeenByNum[num] == 3)
    {
        //quick-reject: number is seen thrice
        return false;
    }
}

foreach (int timesSeen in timesSeenByNum)
{
    if (timesSeen == 1)
    {
        // only rejection case not caught so far is
        // if a number is seen exactly once
        return false;
    }
}

// all good, a number is seen exactly twice or never
return true;   

EDIT: Fixed bugs as pointed out by Jon Skeet. I should also point out that his algo is smarter and probably faster.

Ani
  • 111,048
  • 26
  • 262
  • 307
  • +1: I was just writing this exact solution, but obviously a bit to late ;) – Øyvind Bråthen Nov 15 '10 at 15:22
  • Not -1 yet, but that will only return true if there are 64 values... your `seen != 2` should be `seen != 0 && seen != 2`. Or alternatively, just check for `seen == 1`. – Jon Skeet Nov 15 '10 at 15:24
  • @Jon Skeet: Thanks for that, Jon. I made the mistake tricked by my own LINQ solution that requires `==2`. – Ani Nov 15 '10 at 15:27
0

I would create an array of 32 integer elements, initialized to zero. Let's call it "billy".

For each element of the input array, I'd increment billy[element] of 1.

At the end, check if billy contains only 0 or 2.

Simone
  • 11,655
  • 1
  • 30
  • 43
0

Almost certainly overkill when you've only got 14-ish pairs and only 32-ish possible values, but in the general case you could do something like this:

bool onlyPairs = yourArray.ContainsOnlyPairs();

// ...

public static class EnumerableExtensions
{
    public static bool ContainsOnlyPairs<T>(this IEnumerable<T> source)
    {
        var dict = new Dictionary<T, int>();

        foreach (T item in source)
        {
            int count;
            dict.TryGetValue(item, out count);

            if (count > 1)
                return false;

            dict[item] = count + 1;
        }

        return dict.All(kvp => kvp.Value == 2);
    }
}
LukeH
  • 263,068
  • 57
  • 365
  • 409
0

If the range of items is 0-31, you can store 32 one-bit flags in a uint32. I would suggest taking each item and compute mask=(1 SHL item), and see what happens if you try 'or'ing, 'xor'ing, or adding the mask values. Look at the results for valid and invalid cases. To avoid overflow, you may want to use a uint64 for the addition (since a uint32 could overflow if there are two 31's, or four 30's, or eight 29's).

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Actually, I made a mistake in the original description. The value range is from 1 to 34, so an uint64 is required in any case. – mafu Nov 15 '10 at 16:19
  • @Barna implemented this, see my comment there. – mafu Nov 16 '10 at 09:23
  • @mafutrct: There's an easier way of finding twice versus more than twice. Look at the arithmetic sum and the 'or' values. Notice any relation in all the valid cases, which does not apply in any of the invalid ones? Note that there's no consistent pattern of what's 'wrong' in the invalid cases, but there's something about the valid cases which is different from any invalid case. – supercat Nov 16 '10 at 15:54
  • I see that there is a correlation but I'm not sure if it is mathematically correct to use it? Could you add an explanation and maybe a code snippet so I can understand your idea clearly? – mafu Nov 17 '10 at 14:16
  • @mafutrct: Hmm... if the OR value has at least seven bits set, the only way the sum can be twice that is if every bit appears exactly twice (and there are precisely seven bits set). Checking for whether there are at least seven bits set may be a little annoying, though. Without the check, you could get a false positive on, e.g. 4,1,1,1,1,1,1 (which OR's to 5 and ADDs to 10) – supercat Nov 17 '10 at 16:50
0

I guess (never measured the speed) this codesnipet can give you a new point of view:

int[] array = { 0, 1, 2, 3, 1, 1, 3, 5, 1, 2, 7, 31 }; // this is your sample array

uint[] powOf2 = {
    1, 2, 4, 8,
    16, 32, 64, 128,
    256, 512, 1024, 2048,
    4096, 8192, 16384, 32768,
    65536, 131072, 262144, 524288,
    1048576, 2097152, 4194304, 8388608,
    16777216, 33554432, 67108864, 134217728,
    268435456, 536870912, 1073741824, 2147483648
               };

uint now;
uint once = 0;
uint twice = 0;
uint more = 0;

for (int i = 0; i < array.Length; i++)
{
    now = powOf2[array[i]];

    more |= twice & now;
    twice ^= (once & now) & ~more;
    twice ^= more;
    once |= now;
}

You can have the doubled values in the variable "twice"; Of course it only works for values less than 32;

mafu
  • 31,798
  • 42
  • 154
  • 247
Barna
  • 113
  • 1
  • 1
  • 9
  • Ah, I read after posting: you corrected the value range up to 34. Anyhow you can still use uint64. I guess this solution is much more faster than others presented at this moment. Sorry for the bad lookout of post, have no experiences at posting. – Barna Nov 15 '10 at 16:30
  • I haven't tested this but since it contains a lot more operations than the other ideas this should be slower, no? – mafu Nov 16 '10 at 09:21
  • I've tested. 1) Depends on the input array. Since the question is wether all are pairs, both the answers of Ani or Jon Skeet could be faster, while at the first occurence of a thrice gives the result. 2) for an input array containing the numbers 1..31, looped each selection a million times the following results gained by sampling the time before and after the million loop: Barna - difference: 0.4220000 s Ani - difference: 0.4460000 s Jon Skeet - difference: 0.4850000 s 3) Anyhow, mine still contains the singlets, duplets, and multiplets totally as a spectrum. Ok, it wasn't in the spec. – Barna Nov 16 '10 at 11:13