5

There are multiple related questions, but I'm looking for a solution specific to my case. There is an array of (usually) 14 integers, each in the range of 1 to 34. How can I quickly tell if each int in a specific, static list appears at least once in this array?

For reference, I'm currently using this code, which was written to resemble the spec as closely as possible, so it can certainly be improved vastly:

if (array.Count < 13) {
    return;
}

var required = new int[] {
    0*9 + 1,
    0*9 + 9,
    1*9 + 1,
    1*9 + 9,
    2*9 + 1,
    2*9 + 9,                
    3*9 + 1,
    3*9 + 2,
    3*9 + 3,
    3*9 + 4,
    3*9 + 5,
    3*9 + 6,
    3*9 + 7,
};

IsThirteenOrphans = !required.Except (array).Any ();

The required list is not dynamic, i.e. it will always be the same during runtime. Using Linq is optional, the main aspect is performance.

Edit:

  • The input array is not sorted.
  • Input values may appear multiple times.
  • The input array will contain at least 14 items, i.e. 1 more than the required array.
  • There is only 1 required array and it is static.
  • The values in required are distinct.
  • You may assume that a histogram is cheap to create.

Update: I am also interested in a solution for a sorted input array.

mafu
  • 31,798
  • 42
  • 154
  • 247

7 Answers7

1

This looks like a good candidate for bitwise operations, since the values in required are distinct, static, and between 1 and 34. Instead of saving required as an array, save it as a const ulong. In your arrays to check, create a new ulong populated by left-shift each value and bitwise or.

const ulong comparator = (1UL << 1) | (1UL << 9) | (1UL << 10) | (1UL << 18) | (1UL << 19) | (1UL << 27) | (1UL << 28) | (1UL << 29) | (1UL << 30) | (1UL << 31) | (1UL << 32) | (1UL << 33) | (1UL << 34);

public static bool ContainsDistinct13Values(int[] array)
{
    ulong word = 0;
    foreach (int i in array)
    {
        word |= (1UL << i);
    }
    return word == comparator;
}
Mr Anderson
  • 2,200
  • 13
  • 23
  • Interesting idea, as it is highly parallelizable and has a tiny footprint. I'm sorry to say I can't test it currently, but you have my +1. – mafu Jun 19 '16 at 17:44
  • @mafu I don't currently have my dev environment available either or I'd try finding an even faster implementation. One thing to consider: if your array to check will never have size greater than some reasonable length (16 for example) you could make it fixed length, populate unused indices with zeros and unroll for performance gain. https://en.m.wikipedia.org/wiki/Loop_unrolling – Mr Anderson Jun 19 '16 at 19:03
1

Idea 1
If you need to compare with several required lists then you might sort the input list and then simply compare it by iterating. But sorting is of course not too fast, but not too slow either. But if you compare with several required lists the overhead of sorting might be amortized quickly.

Once the array is sorted comparing is trivial:

for(int i = 0; i < 14; i++)
  if(arr[i] != required[i]) return false;

return true;

Idea 2
Or if the 14 integers are distinct/unique you can simply make required a HashSet and do

input.Count(i => required.Contains(i)) == 14

But I don't know without actually testing it if that's faster than sorting.

Idea 3
Calculate a quick hash which is invariant under permutations over the 14 ints and compare it to the known value of require. Only if the hash matches do the more expensive comparison.

//Prepare/precalculated
int[] hashes = new int[34];
Random random = new Random();
for(int i = 0; i < 34; i++)
  hashes[i] = random.NextInt();

//On each list
int HashInts(int[] ints)
{
  int result = 0;
  foreach(int i in ints)
    result += hashes[i - 1];

  return result;
}

A wise choice of values for hashes might improve it a bit, but random values should be fine.

Idea 4
Create a Histogram:

int[] CreateHistogram(int[] ints)
{
  int[] counts = new int[34];
  foreach(int i in ints)
  {
    counts[i - 1]++;
  }

  return counts;
}

You can avoid the array creating by reusing an existing array if performance is really important.

gevorg
  • 4,835
  • 4
  • 35
  • 52
CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
1

If you want fast way you shouldn't use linq, If a given list items all are bellow 35 you can remove if (lst[i] < 35) The bellow answer traverse list at most one time, and acts like counting sort:

public bool FindExactMatch(int[] array, List<int> lst)
{
    bool[] a34 = new bool[35];

    foreach(var item in array)
    {
        a34[item] = true;
    }

    int exact = 0;

    for (int i = 0; i < lst.Count; i++)
    {
        if (a34[lst[i]])
        {
            exact++;
            if (exact == array.Length) return true;

            a34[lst[i]] = false;
        }
    }

    return false;
}

for sorted list if list size is big you can do lst.BinarySearch(array[i]) which takes 14 * log(n) * c1 at most, and I think it's efficient enough also if you implement it may be become faster, and I didn't test binary search with my own implementation, but Min, Max, Sort in linq is slower than (between 4 and 10 time) your own (good) implementation. And if sorted list size is small i prefer to use algorithm like above because constant c1 is small in above algorithm and may be larger in binary search algorithm.

gevorg
  • 4,835
  • 4
  • 35
  • 52
Saeed Amiri
  • 22,252
  • 5
  • 45
  • 83
  • 1
    By changing true to false in a34, you can skip the initialization loop. Since all items are guaranteed to be < 35, you can also skip the if condition. – mafu Nov 17 '10 at 09:31
  • @mafutrct, I think I can't do `By changing true to false in a34`, because I'm using it's default false value and just initialize the items of array a34 when there is an item in related position, Also I updated the answer by your comment on < 35 – Saeed Amiri Nov 17 '10 at 19:58
1

If you have a 34+ bit integral type available, and C style bit operations, then you could compute such an integer V from the variable list (if the list is v[0], v[1], ... then V is (1 << v[0]) | (1 << v[1]) ... where 1 is of the same type as V) and have a predefined such integer S for the static list, computed analogously. The test to see if the static list is contained in the variable list is then (S & ~V) == 0.

dmuir
  • 449
  • 2
  • 3
1

One possibility is to change the way that you store the data. Since the range of possible values is limited to 1-34, instead of storing a list of numbers, you could store the counts of each number present:

int[] counts = new int[34];

If your list has one 1 and two 3s, then counts[0] == 1 && counts[2] = 2 (you could alternately use 1-based indexing if that makes things faster (fewer subtractions.))

Now to evaluate that each int in a list appears at least once, you simply index into the array sequentially for each x and verify that all counts[x] > 0. There will be overhead associated with transforming the data from the counts form to the list form, but that's only a problem if you also frequently need to view the data in the list form. Another advantage to this storage format is that adding/removing to counts will never involve more than a single array element; in the list form, removing an element in the middle of the list entails a copy of multiple elements.

Dan Bryant
  • 27,329
  • 4
  • 56
  • 102
0

You could easily loop through the items and find out is there any item missing. By your example i understand you just wanted to know if there is any item from required is missing in array. so you could write

bool notContains = false;

foreach (var iddd in required)
{
    foreach (var ar in array)
    {
        if (iddd == ar) notContains=true;
    }

    if (notContains == false) break;
}

This is lot faster than your approach. Check out the below code with timer added. Your approach took 5 ms, but the new one take 0 ms

System.Diagnostics.Stopwatch aTimer = new System.Diagnostics.Stopwatch();
aTimer.Start();

var IsThirteenOrphans = !required.Except(array).Any();
aTimer.Stop();

System.Diagnostics.Stopwatch bTimer = new System.Diagnostics.Stopwatch();
bTimer.Start();
bool notContains = false;

foreach (var iddd in required)
{
    foreach (var ar in array)
    {
        if (iddd == ar) notContains=true;            
    }

    if (notContains == false) break;
}

bTimer.Stop();
gevorg
  • 4,835
  • 4
  • 35
  • 52
RameshVel
  • 64,778
  • 30
  • 169
  • 213
  • 1
    You don't stop aTimer, but start it twice... And his code will show 0ms for a single iteration too. You need to time a million iterations for a reasonable result. 3 seconds is no realistic result, you must have made some big mistake while measuring. – CodesInChaos Nov 16 '10 at 10:45
  • @CodeInChaos, i changed the code to stop the timer. Still it takes more time. – RameshVel Nov 16 '10 at 10:49
  • Yeah, the data i have used in array object is 5. so the ellapsed time was 5 ms. i tested with 200 items and it increaed to 10 ms, but the second approach stills in 0ms. Consider million items, surely the performance will degrade. – RameshVel Nov 16 '10 at 10:53
  • 1
    your benchmarking is not that good. You are running things only once, which means you have to pay for the initial setup, the JIT and possible more. and for things this small, the amount of work that needs to be done is soo tiny, that you probably need a few hundred thousand tries to get a valid number – Cine Nov 16 '10 at 10:53
0

Edit: So I understood your question and probably have some nice over-complicated solution. Another question is, how good performance it has.

static void Main(string[] args)
{
    var required = new int[]
                       {
                           0*9 + 1,
                           0*9 + 9,
                           1*9 + 1,
                           1*9 + 9,
                           2*9 + 1,
                           2*9 + 9,
                           3*9 + 1,
                           3*9 + 2,
                           3*9 + 3,
                           3*9 + 4,
                           3*9 + 5,
                           3*9 + 6,
                           3*9 + 7,
                       };

    precomputed = required.Select((x, i) => new { Value = x, Offset = (UInt16)(1 << i) }).ToDictionary(x => x.Value, x => x.Offset);

    for (int i = 0; i < required.Length; i++)
    {
        precomputedResult |= (UInt16)(1 << i);
    }

    int[] array = new int[34]; // your array goes here..
    Console.WriteLine(ContainsList(array));

    Console.ReadKey();
}

// precompute dictionary
private static Dictionary<int, UInt16> precomputed;
// precomputed result
private static UInt16 precomputedResult = 0;

public static bool ContainsList(int[] values)
{
    UInt16 result = 0;
    for (int i = 0; i < values.Length; i++)
    {
        UInt16 v;
        if (precomputed.TryGetValue(values[i], out v))
            result |= v;
    }

    return result == precomputedResult;
}
gevorg
  • 4,835
  • 4
  • 35
  • 52
Euphoric
  • 12,645
  • 1
  • 30
  • 44
  • He doesn't need bigger list. The list-size of 14 follows directly from the rules of mahjong. – CodesInChaos Nov 16 '10 at 10:50
  • Oh. Ok. Then question is. Does he really need to worry about performance? In this case I would go for better readability, unless he is going to repeat that function few milion times. – Euphoric Nov 16 '10 at 10:52
  • Actually I am going to repeat that function many million times :) – mafu Nov 16 '10 at 11:00