1

I'm trying to create an algorithm which determines whether or not all of the rows in a 2D array of integers are unique (i.e. disjoint). Currently, I have a brute force algorithm which checks each value in each row to each value in the other rows, but I'd like to speed this up. Is there some kind of divide and conquer method to handle this? I have found some semi-solutions for this in single arrays and lists, but not in 2d arrays.

candied_orange
  • 7,036
  • 2
  • 28
  • 62
  • What language are you trying to do this in? – candied_orange Dec 09 '14 at 21:39
  • Isn't this problem equivalent to checking that all numbers in a 2d array are unique? – kraskevich Dec 09 '14 at 21:41
  • @CandiedOrange: Preferably Java. – amanda_dennis Dec 09 '14 at 21:53
  • Does a numbers position within a row have any significance? The only definition for disjoint I know of is regarding sets. If your rows are not sets is it intended that they be paired down into sets? – candied_orange Dec 09 '14 at 23:03
  • @CandiedOrange: I guess I don't know what you mean by "paired down into sets." The 2d array cannot have duplicates in rows, but the rows CAN contain only one kind of integer, for instance, row A could be all 2's, and row B could be all 3's and they would be considered disjoint. – amanda_dennis Dec 09 '14 at 23:34
  • Right but if [1,2,3] and [3,2,1] are considered disjoint then position in row is important. If position is not important then [1,2,2] contains no more information for us than the set [1,2] does. So the row can be simplified. – candied_orange Dec 09 '14 at 23:49
  • Ahh, I see what you mean. Position in the row makes no difference in my algorithm so I guess sets will work. Thank you! – amanda_dennis Dec 09 '14 at 23:55

4 Answers4

0

If you want to check whether there are two rows that contain the same number, you can put all the numbers along with numbers of rows that they belong to into one long list. Then sort this list which will make all the identical numbers stand next to each other. Than you can easily determine for each cluster of identical numbers whether they originally belonged to the same row or not.

If your table is n × m, the algorithm will run in O(nm × (log(n) + log(m))).

Danylo Mysak
  • 1,514
  • 2
  • 16
  • 22
0

To test whether any two rows are disjoint**, make two sets (like java sets that reject duplicates and do so in almost O(1) time), one for each row and count the elements of each.

The two rows are disjoint if:

count(rowASet) + count(rowBSet) == count( union(rowASet, rowBSet) )

This suggests an algorithm for a 2D array where a rows are successively added (as sets) to a running total set. The count of the total set is measured before and after the addition of each row's set. If that count increases by the size of the row set just added, then the row just added is disjoint from those added so far.

**See the conversation with @CandiedOrange. I'm taking "disjoint row" to mean a row that contains no element found in any other row, but might itself contain repeated elements internally.

danh
  • 62,181
  • 10
  • 95
  • 136
0

If you are allowed to use java collections consider using java's HashSet. It was designed for things like this:

areDisjoint() inspired by http://www.geeksforgeeks.org/check-two-given-sets-disjoint/

import java.util.*;

public static void main (String[] args)
{        

    int[][] twoD = {{3,3,3},{4,4,4},{5,5,5}};
    if ( disjointRows(twoD) )
    {
        System.out.println("Mutually disjoint rows");
    }
    else
    {
        System.out.println("Some element duplicated in other row");
    }
}

public static boolean disjointRows(int[][] twoD)
{

    // -- Copy 2D array into data structure -- //

    ArrayList<HashSet<Integer>> rows = new ArrayList<HashSet<Integer>>();

    for (int[] row : twoD)
    {            
        HashSet<Integer> hs = new HashSet<Integer>();
        for (int elem : row)
        {
            hs.add(elem);                
        }
        rows.add( hs );
    }

    // Above is O = r*c time just as it would be to copy the 2D array. 
    // or you can express it as O(n^2)


    // -- Mutual disjoint rows test -- //

    // Compare every combination of rows
    for (int y=1; y< rows.size(); y++)
    {
        for (int x=0; x<y; x++)
        {
          //TODO remove debugging code                
            System.out.print(areDisjoint( rows.get(x), rows.get(y) ) );
            System.out.print("=");
            System.out.print("(" + x + "," + y + ")  ");

            if (! areDisjoint( rows.get(x), rows.get(y) )  )
            {
                return false;
            }
        }
        System.out.println("");
    }
    return true;
    //Above is O = r(r-1)/2 * c 
    //or you can express it as O(n^3)
}

static boolean areDisjoint(Set<Integer> set1, Set<Integer> set2)
{
    //set1 must not contain anything in set2 
    for (int i : set2)
    {
        if ( set1.contains(i) )
            return false;
    }
    return true;
    // Above is c or O(n) because contains lookup is O(1)
}

Output:

// true=(0,1)
// true=(0,2) true=(1,2)
// Mutually disjoint rows

I can't tell if this is better than your "brute force" solution since you didn't post it.

I can tell from a big O notation perspective this is neck and neck with japreiss's solution. We're both at O(n^3). This solution uses more memory because it copies the 2D array rather than mutate it. When performance is equal I strongly urge you to pick your algorithm based on readability. I admit mine is longer. I took the time to comment it.

See also: Is a Java hashmap really O(1)?

//         If each row is to be disjoint from all other rows then
//         then testing them all against each other would produce
//         this truth table.  
//        
//                0    1    2   
//              |----|----|----|
//            0 | F  | T  | T  |
//              |----|----|----|
//            1 | T  | F  | T  |
//              |----|----|----|
//            2 | T  | T  | F  |
//              |----|----|----|
//        
//         Testing a row against it self can be ignored as can        
//         tests that simply transpose the input rows. Leaving us with
//        
//                0    1    2   
//              |
//            0 | 
//              |----|
//            1 | T  |
//              |----|----|
//            2 | T  | T  |
//              |----|----|
//        
//         So long as that's true, the rows are mutually disjoint
Community
  • 1
  • 1
candied_orange
  • 7,036
  • 2
  • 28
  • 62
  • Sets are great, but they don't allow for duplicates within them, so, for instance a 2d array of [3 3 3] [4 4 4] wouldn't work this way, right? – amanda_dennis Dec 09 '14 at 22:40
  • @amanda_dennis If, as you said, position in row isn't important then each row can be reduced to a set without loss of any important information since [3,3,3] only communicates that 3 is not allowed anywhere else. I've updated my answer with a solution that should work with any size 2d array. – candied_orange Dec 10 '14 at 00:54
  • @amanda_dennis - [3,3,3] as set becomes the set {3} and [4,4,4] becomes the set {4}. Those two sets are disjoint. But, for example [1,2,2] becomes {1,2} and [2,3,3] becomes {2,3} (no matter the order). Those two sets are not disjoint. I think thats what you're aiming for. I think my answer gets at the disjoint test a little more elegantly using counts, but traversing them as CandiedOrange suggests should work too. – danh Dec 10 '14 at 00:56
0

EDIT: my answer was slow and stupid. Or to put it more kindly, it was doing unnecessary work because it computed which two rows intersect instead of whether or not any two rows intersect.

After I posted I thought of 2 faster algorithms but they turned out to be Danylo Mysak's answer and the "running total set" danh hinted at.

I would still like to benchmark these different approaches.

japreiss
  • 11,111
  • 2
  • 40
  • 77
  • I'd bet money that we're closer than you think. If you copy the array thats O(n^2). Sorting the array is O(n*nlogn). Your last loop block is O(n^3) the same as mine. As for boxing int's that's a one time deal and numbers between -128 to 127 are cached anyway. You on the other hand have an if structure that could throw off branch prediction. But we're splitting hairs. I bet we're very close. Though, if I had to solve this in c I'd rather port over your code. :) – candied_orange Dec 10 '14 at 07:04