3

I have 2 sets of data. Let say one is a people, another is a group. A people can be in multiple groups while a group can have multiple people. My operations will basically be CRUD on group and people. As well as a method that makes sure a list of people are in different groups (which gets called alot).

Right now I'm thinking of making a table of binary 0's and 1's with horizontally representing all the people and vertically all the groups.

I can perform the method in O(n) time by adding each list of binaries and compare with the "and" operation of the list of binaries.

E.g

Group   A    B    C    D
ppl1    1    0    0    1
ppl2    0    1    1    0
ppl3    0    0    1    0
ppl4    0    1    0    0

check (ppl1, ppl2) = (1001 + 0110) == (1001 & 0110)
               = 1111 == 1111
               = true

check (ppl2, ppl3) = (0110 + 0010) == (0110+0010)
               = 1000 ==0110
               = false

I'm wondering if there is a data structure that does something similar already so I don't have to write my own and maintain O(n) runtime.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user1181031
  • 71
  • 2
  • 7

3 Answers3

2

I don't know all of the details of your problem, but my gut instinct is that you may be over thinking things here. How many objects are you planning on storing in this data structure? If you have really large amounts of data to store here, I would recommend that you use an actual database instead of a data structure. The type of operations you are describing here are classical examples of things that relational databases are good at. MySQL and PostgreSQL are examples of large scale relational databases that could do this sort of thing in their sleep. If you'd like something lighter-weight SQLite would probably be of interest.

If you do not have large amounts of data that you need to store in this data structure, I'd recommend keeping it simple, and only optimizing it when you are sure that it won't be fast enough for what you need to do. As a first shot, I'd just recommend using java's built in List interface to store your people and a Map to store groups. You could do something like this:

// Use a list to keep track of People
List<Person> myPeople = new ArrayList<Person>();
Person steve = new Person("Steve");
myPeople.add(steve);
myPeople.add(new Person("Bob"));


// Use a Map to track Groups
Map<String, List<Person>> groups = new HashMap<String, List<Person>>();
groups.put("Everybody", myPeople);
groups.put("Developers", Arrays.asList(steve));

// Does a group contain everybody?
groups.get("Everybody").containsAll(myPeople); // returns true
groups.get("Developers").containsAll(myPeople); // returns false

This definitly isn't the fastest option available, but if you do not have a huge number of People to keep track of, you probably won't even notice any performance issues. If you do have some special conditions that would make the speed of using regular Lists and Maps unfeasible, please post them and we can make suggestions based on those.

EDIT:

After reading your comments, it appears that I misread your issue on the first run through. It looks like you're not so much interested in mapping groups to people, but instead mapping people to groups. What you probably want is something more like this:

Map<Person, List<String>> associations = new HashMap<Person, List<String>>();

Person steve = new Person("Steve");
Person ed = new Person("Ed");

associations.put(steve, Arrays.asList("Everybody", "Developers"));
associations.put(ed, Arrays.asList("Everybody"));

// This is the tricky part
boolean sharesGroups = checkForSharedGroups(associations, Arrays.asList(steve, ed));

So how do you implement the checkForSharedGroups method? In your case, since the numbers surrounding this are pretty low, I'd just try out the naive method and go from there.

public boolean checkForSharedGroups(
                    Map<Person, List<String>> associations, 
                    List<Person> peopleToCheck){
    List<String> groupsThatHaveMembers = new ArrayList<String>();
    for(Person p : peopleToCheck){
        List<String> groups = associations.get(p);
        for(String s : groups){
            if(groupsThatHaveMembers.contains(s)){
                // We've already seen this group, so we can return
                return false;
            } else {
                groupsThatHaveMembers.add(s);
            }
        }
    }
    // If we've made it to this point, nobody shares any groups.
    return true;
}

This method probably doesn't have great performance on large datasets, but it is very easy to understand. Because it's encapsulated in it's own method, it should also be easy to update if it turns out you need better performance. If you do need to increase performance, I would look at overriding the equals method of Person, which would make lookups in the associations map faster. From there you could also look at a custom type instead of String for groups, also with an overridden equals method. This would considerably speed up the contains method used above.

The reason why I'm not too concerned about performance is that the numbers you've mentioned aren't really that big as far as algorithms are concerned. Because this method returns as soon as it finds two matching groups, in the very worse case you will call ArrayList.contains a number of times equal to the number of groups that exist. In the very best case scenario, it only needs to be called twice. Performance will likely only be an issue if you call the checkForSharedGroups very, very often, in which case you might be better off finding a way to call it less often instead of optimizing the method itself.

Community
  • 1
  • 1
TwentyMiles
  • 4,063
  • 3
  • 30
  • 37
  • Yes, the OP should definitely take a more object oriented approach to solving this issue unless there is some other reason (professor) for doing it a particular way. Using an OO approach will make later problems easier like - What if the group needs some extra attributes such as chair-person, name, description? – aglassman Jun 24 '13 at 19:04
  • Thanks for the suggestion, I would estimate there will be ~100 groups and ~10000 people at most. There won't be much modifying of data. The only thing that would be called the most will be the check function which takes in a list of people and return true if none of them belong to the same group and false otherwise. I want to store the data in a way that uses very few memory and can perform this function very fast. – user1181031 Jun 24 '13 at 19:10
  • I should mention I'll store the entire info of group and people elsewhere (they are actually classes), I just need this relation table for fast calculation of this 1 function. – user1181031 Jun 24 '13 at 19:13
  • 1
    @user1181031 I've updated my answer based on your comments, I think it should now be closer to what you are looking for. – TwentyMiles Jun 24 '13 at 21:55
0

Have you considered a HashTable? If you know all of the keys you'll be using, it's possible to use a Perfect Hash Function which will allow you to achieve constant time.

0

How about having two separate entities for People and Group. Inside People have a set of Group and vice versa.

class People{

Set<Group> groups;
//API for addGroup, getGroup

}

class Group{

Set<People> people;
//API for addPeople,getPeople

}

check(People p1, People p2):

1) call getGroup on both p1,p2
2) check the size of both the set,
3) iterate over the smaller set, and check if that group is present in other set(of group)

Now, you can basically store People object in any data structure. Preferably a linked list if size is not fixed otherwise an array.

ajay.patel
  • 1,957
  • 12
  • 15
  • This might work, I'm just wondering if there are 10,000 people, 100 groups, would the check function be fast enough to take less than a second to run? – user1181031 Jun 24 '13 at 19:58
  • I am not sure about that, but if you exclude the preprocessing time( populating these People object). I think this should be quit fast. reason being, once the preprocessing is done, you end up comparing only the groups that belog to these people not like your case where in you have to iterate the complete array to first calculate the sum. – ajay.patel Jun 24 '13 at 20:22
  • More over what happens when you have 10000 groups, you would end up getting a 10000 digit number? and than do and on it? – ajay.patel Jun 24 '13 at 20:27
  • I know the group won't exceed 100. The way I want to do it is in binary, where each horizontal data is stored as a list of 32 bit integers. That way you can do "add" or "and" on 32 bit machine in 1 operation for every 32 boolean, and it will be 4 cpu operations for 128 bits. The entire map of data would be about ~16kb – user1181031 Jun 24 '13 at 20:42