Matching Algorithm (Java)

Question

I am writing an algorithm to match students with different groups. Each group has a limited number of spots. Each student provides their top 5 choices of groups. The students are then placed into groups in a predetermined order (older students and students with perfect attendance are given higher priority). There is no requirement for groups to be filled entirely, but they cannot be filled passed capacity.

I've looked into similar marriage problems such as the Gale-Shapely stable marriage algorithm, but the problem I am having is that there far fewer groups than students and each group can accept multiple students.

What is the best way to implement such an algorithm to find a solution that has been optimized entirely such that there is no better arrangement of students in groups? In terms of algorithm complexity, I'm placing roughly 600 students into 10-20 groups.

What is *most mathematically perfect*? If student A has a higher priority than student B, is an arrangement with A's 2nd choice and B's 3rd choice better than A's 1st choice and B's 5th choice? — Manu, Feb 19 '16 at 13:48
There is no "most mathematically perfect". You need some function that evaluates a given arrangement. That said, if you define this function linear, then this is a (integral) linear program: https://en.wikipedia.org/wiki/Linear_programming for which there should be Java libraries. If you need help with formulating the inequalities, I can provide help, but I want to see if you are interested in that method before I do. — Aziuth, Feb 19 '16 at 13:50
@nafas each group has a maximum capacity, but these capacities vary per group. No group must be entirely filled, but it can not exceed its capacity. — Dylan Wheeler, Feb 19 '16 at 15:36
@Aziuth perhaps my phrasing was not entirely accurate, but I was under the impression that (at least according to stable marriage problems) there always exists a stable marriage. I'm seeking a solution that has been optimized entirely such that there is no better arrangement of students. — Dylan Wheeler, Feb 19 '16 at 15:37
@Confiqure [https://en.wikipedia.org/wiki/Envy-freeness](https://en.wikipedia.org/wiki/Envy-freeness) — nafas, Feb 19 '16 at 15:41
@Manu for the sake of my problem, I think that A's 2nd choice and B's 3rd choice is better than A's 1st choice and B's 5th choice because the combined placement is better for both students. — Dylan Wheeler, Feb 19 '16 at 15:47
@Confiqure Sorry, but I think you still don't understand. A matching can be perfect, yes. But in your comment addressed to Manu, you write "I thin that A's 2nd choice...". You yourself are not sure how to evaluate a given situation, still you talk about "no better arrangement of students". To evaluate "better", you need something like an evaluation function, which is a designing parameter. Like "Every student who gets a first choice is worth 10 points, for second choice 8 points, if no top choice is met -100 points..." - something like that. That's a parameter. — Aziuth, Feb 23 '16 at 08:23
@Aziuth sounds great, I appreciate your feedback with my question! — Dylan Wheeler, Feb 23 '16 at 14:57
@Confiqure: Thanks. I am not sure how much you know about optimization, so therefore I'll go with some basics. All these algorithms that other people recommend you are special cases in which there is an algorithm to solve a very special problem. Can work. If you want more freedom, go with linear programming as linked above. There, you define a linear evaluation function and linear constraining functions, maybe constrain to integer values, then you use some solver (there should be libraries). If you do this and have troubles to convert your problem into a system of inequalities, ask. — Aziuth, Feb 24 '16 at 09:01

score 1 · Answer 1 · answered Feb 19 '16 at 13:52

1

I would modify the Knapsack problem (Knapsack problem, wikipedia) to work with K numbers of groups (knapsacks) instead of just one. You can assign "value" to the preferences they have and the number of spots would be the maximum "weight" of the Knapsack. With this, you can backtrack to check what is the optimal solution of the problem.

I am not sure how efficient you need the problem to be, but I think this will work.

answered Feb 19 '16 at 13:52

randombee

699
1
5
26

You present an interesting concept. I'm not entirely convinced yet that your solution is the most suitable for this problem, but I will do more research before confirming this. – Dylan Wheeler Feb 19 '16 at 15:41

nafas · Answer 2 · 2016-02-19T14:08:57.863

the most mathematically perfect

is very opinion based.

simplicity (almost) always wins. here

Here is a pseudocode:

students  <-- sorted by attendance

for i=0 to n in students:
   groups <-- sorted by ith student's preference
   for  j=0 to m in groups:
     if group j has space then add student i to group j; studentAssigned=true; break;

   if studentAssigned=false;
     add i to unallocated

for i=0 to k in unallocated 
  allocate i to a random group that is not filled

score 1 · Accepted Answer · answered Feb 19 '16 at 14:45

NB The close votes are terribly misplaced. Algorithm choice and design to solve an ambiguous problem is absolutely part of programming.

I think you'll get farther with Minimum Weight Bipartite Matching than Stable Marriage (also called the Hungarian method or algorithm or Maximum Weight Matching, which can give you a min weight matching just by negating the weights).

You are out to match positions with students. So the two node types in the bipartite graph are these.

The simplest statement of the algorithm requires a complete weighed bipartite graph with equal numbers of nodes in each set. You can think of this as a square matrix. The weights are the elements. The rows are students. The columns are positions.

The algorithm will pick a single element from each row/column such that the sum is minimized.

@nava's proposal is basically a greedy version of MWBM that's not optimal. The true Hungarian algorithm will give you an optimal answer.

To handle the fact that you have fewer positions than students is easy. To the "real" positions add as many "dummy" positions as needed. Connect all these to all the students with super high-weight edges. The algorithm will only pick them after all the real positions are matched.

The trick is to pick the edge weights. Let's call the ordinal where a student would be considered for a position O_i for the i'th student. Then let R_ip be the ranking that the same student places on the p'th position. Finally, W_ip is the weight of the edge connecting the i'th student to the p'th position. You'll want something like:

W_ip = A * R_ip + B * O_i

You get to pick A and B to specify the relative importance of the students' preferences and the order they're ranked. It sounds like order is quite important. So in that case you want B to be big enough to completely override students' rankings.

A = 1, B = N^2, where N is the number of students.

Once you get an implementation working, it's actually fun to tweak the parameters to see how many students get what preference, etc. You might want to tweak the parameters a bit to give up a little on the order.

At the time I was working on this (late 90's), the only open source MWBM I could find was an ancient FORTRAN lib. It was O(N^3). It handled 1,000 students (selecting core academic program courses) in a few seconds. I spent a lot of time coding a fancy O(N^2 log N) version that turned out to be about 3x slower for N=1000. It only started "winning" at about 5,000.

These days there are probably better options.

Looks great! Will be attempting to implement this. – Dylan Wheeler Feb 19 '16 at 20:34 — Dylan Wheeler, Feb 19 '16 at 20:34

score 0 · Answer 4 · answered Feb 19 '16 at 13:52

0

For each group:

create an ordered set and add all the students (you must design the heuristic which will order the students inside the set, which could be attendance level multiplied by 1 if the group is within their choice, 0 otherwise).
Fill the group with the first nth students

But there are some details that you didn't explain. For example, what happen if there are students that couldn't enter any of their 5 choices because they got full with other students with higher priority?

answered Feb 19 '16 at 13:52

Nadir

1,799
12
20

Assigning a heuristic value to each student is a formula that I will need to play around with, but for the sake of my problem we can assume that all students will be placed into groups. They might have to rank 7 groups or 10 groups to ensure this. – Dylan Wheeler Feb 19 '16 at 15:50

Matching Algorithm (Java)

4 Answers4