0

I need a way to compare arrays to each other and determine how similar they are. I have the following code:

                        float totalSame = 0F, percentSame = 0F, totalElements = 0F, sameness;
                        foreach (var previouslyStoredArray in ArrayOfArrays)
                        {
                            sameness = 0F;
                            arrayIndex = 0;
                            while (arrayIndex < ArrayToBeCompared.Count())
                            {
                                /*Compares an element from ArrayToBeCompared with the corresponding
                                position in all past arrays stored in ArrayOfArrays. When they are the same, the variable 'sameness'
                                is an increased by a value of 1. Sameness represents the number of same
                                elements within a single, previously stored array and the ArrayToBeCompared. 'totalSame' represents
                                the total number of elements that are the same between the ArrayToBeCompared and all arrays in the ArrayOfArrays.*/
                                if (ArrayToBeCompared[arrayIndex] == previouslyStoredArray[arrayIndex])
                                {
                                    sameness++;
                                }
                                arrayIndex++;
                            }
                            totalSame = sameness + totalSame;
                        }
                        totalElements = ArrayToBeCompared.Length * ArrayOfArrays.Length;                            
                        //By taking the total number of similar elements and dividing by the total
                        //number of elements we can get the percentage that are similar
                        percentSame = totalSame / totalElements * 100F;

This code worked fine when I was testing it with small arrays, however when I attempted to implement it in my program, it slowed to a standstill. The ArrayOfArrays contains 45 arrays with ~300,000 elements each. The ArrayToBeCompared is ~300,000 elements as well. Is there any way to increase the efficiency of my comparative function so that a comparison of that size can be done several times or at least once a second? Thanks!

Brian
  • 11
  • 4
  • possible duplicate of [Comparing Arrays in C#](http://stackoverflow.com/questions/713341/comparing-arrays-in-c-sharp) – cnd Feb 25 '14 at 05:12
  • 1
    You are comparing every item and producing a value on how closely related they are. That requires checking everything. Most comparisons short circuit the whole thing and don't produce a value at the end. – Simon Whitehead Feb 25 '14 at 05:15
  • Can you correct the indendation of the code displayed? – Sebastian Mach Feb 25 '14 at 06:30

2 Answers2

0

Since you are comparing each and every element to rest of the elements, it is bound to be time consuming.
Only optimization I can think of is not computing ArrayToBeCompared.Count() all the times. Like this:

int lengthOfArrayToBeCompared = ArrayToBeCompared.Count(); // This step
float totalSame = 0F, percentSame = 0F, totalElements = 0F, sameness;
foreach (var previouslyStoredArray in ArrayOfArrays)
{
    sameness = 0F;
    arrayIndex = 0;
    while (arrayIndex < lengthOfArrayToBeCompared)
    {
        ...

This optimization will help you immensely. Because you were doing ArrayToBeCompared.Count() (45 * 300,000 =) 13,500,000 times. This is reduced to 1 time.

Ganesh Jadhav
  • 2,830
  • 1
  • 20
  • 32
  • Why not just use `Array.Length` and save yourself counting the elements in the first place? – codemonkeh Feb 25 '14 at 06:26
  • @codemonkeh: `.Count()` is for enumerators. You are talking about arrays. – Ganesh Jadhav Feb 25 '14 at 06:32
  • 1
    @P5Coder No, `.Count()` is a Linq extension method for all IEnumerables, which Arrays support. However `.Count()` is smart enough that if the item implments `ICollection` it will use `ICollection.Count` internally which maps to `Array.Length` for an array. – Scott Chamberlain Feb 25 '14 at 06:34
0

You seance the individual elements don't depend on each other could use parallel processing to speed it up some.

float totalSame = 0F, percentSame = 0F, totalElements = 0F;
foreach (var previouslyStoredArray in ArrayOfArrays)
{
    var lockObject = new Object();

    Parallel.For(0, //min index
                 Math.Min(ArrayToBeCompared.Length, previouslyStoredArray.Length), //max index
                 () => 0F, //Initial thread local sameness value
                 (arrayIndex, loopState, localSameness) =>
                 {
                     if (ArrayToBeCompared[arrayIndex] == previouslyStoredArray[arrayIndex])
                         localSameness++;
                     return localSameness;
                 },
                 (localSameness) => 
                 {
                     //This function is not thread safe so we must lock while we aggregate the local counts.
                     lock(lockObject)
                     {
                         totalSame += localSameness;
                     }
                 });
}
totalElements = ArrayToBeCompared.Length * ArrayOfArrays.Length;                            
//By taking the total number of similar elements and dividing by the total
//number of elements we can get the percentage that are similar
percentSame = totalSame / totalElements * 100F;
Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431