Computing assignment with additional constraints

Question

Suppose I have a bunch of students and a couple of professors. Each student needs to take an oral exam with one of the professors. For this, every student supplies an (ordered) list consisting of three professors he would prefer taking the exam with. Of course, each professor can only give a limited number of exams. I can use the Kuhn-Munkres algorithm to compute an assignment where as many students as possible are assigned to their first wish professor.

Now suppose instead that every student has to take two exams, and accordingly supplies two wish lists. Again I want to assign as many students as possible to their first wish professors, but subject to the constraint that a student must not take both his exams with the same professor.

Is there some algorithm for computing an optimal assignment efficiently? Maybe a generalization of the Kuhn-Munkres algorithm? Or is this new problem already NP-hard? What hard problem could one try to reduce to this one?

I should emphasize that I am looking for an exact solution. It is pretty easy to think of some heuristics, like considering one type of exam after the other.

If you want to maximize the number of students assigned to their first wish professor, why do you need lists with three professors? — Lior Kogan, Jul 05 '13 at 12:59
I'm not sure about the hardness, but regardless, there's a good chance that integer programming is the path of least resistance. — David Eisenstat, Jul 05 '13 at 13:32
@LiorKogan I think the goal is to have all students assigned to one of their 3 chosen professors, then, **if there are multiple solutions**, pick the one that maximizes the number of students that have their first-wish professor. — Bernhard Barker, Jul 05 '13 at 13:42
Instead of using Kuhn-Munkres to match students to professors, you can use Kuhn-Munkres to match students to pairs of professors. You'll just need to take a student's wish lists, and combine it into one in some manner. The ranking of a pair of professors could be the average of the individual ranking, or the ranking of the less desirable professor, for example. — Teepeemm, Jul 05 '13 at 13:47
@Teepeemm It's difficult to control each professor's workload that way. — David Eisenstat, Jul 05 '13 at 13:55
@Teepeemm I also had this idea, however I don't know how to meet the capacity constraints in this way. — pgraf, Jul 05 '13 at 15:55

score 2 · Accepted Answer · edited May 23 '17 at 10:32

You can model this exactly using Integer programming as an Assignment Problem.

Variables

Let there be s students and p professors.

The decision variable

  X_sp is 1 if student s takes an exam with professor p
          0 otherwise

 Let Limit_p be the number of exams that professor p can handle.

Handling Student Preferences

We can handle each student's preference through the costs (objective)

Let C_sp be the 'cost' of assigning student s to take an exam with professor p.

We keep progressively making the costs higher as the preference decreases.

C_sp = 1 if p is the *first* preference of s
C_sp = 2 if p is the *second* preference of s
C_sp = 3 if p is the *third* preference of s
...
C_sp = n if p is the *n th* preference of s

Formulation

Case 1: One Exam

 Min (sum over s)(sum over p) C_sp X_sp

 subject to:
 (sum over p) X_sp = 1 for each student s   # Every student must take an exam
 (sum over s) X_sp <= Limit_p for each professor p   # Every prof can only handle so many exams

 X_sp = {0,1} # binary

Case 2: student takes two Exams

 Min (sum over s)(sum over p) C_sp X_sp

 subject to:
 (sum over p) X_sp = 2 for each student s   # Every student must take an exam
 (sum over s) X_sp <= Limit_p for each professor p   # Every prof can only handle so many exams

 X_sp = {0,1} # binary

Taking care of no student can take both exams with the same professor

Normally, we'd have to introduce indicator variables Y_sp to denote whether or not a student has taken an exam with professor p. However, in this case we don't even have to do that. The fact that X_sp is binary will ensure that no student ends up taking 2 exams with the same professor.

If the student has a preference of which exam they'd like to take with a professor, then adding a subscript e to the decision variable is required. X_spe

You can solve this either through the Hungarian method, or easier still, any available implementation of LP/IP solver.

The Hungarian algorithm's complexity is O(n^3), and since we haven't introduced any side constraints to spoil the integrality property, the linear solution will always be integral.

Update (A little bit of theory)

Why are the solutions integral? To understand why the solutions are guaranteed to be integral, a little bit of theory is necessary.

Typically, when confronted with an IP, the first thought is to try solving the linear relaxation (Forget the integral values requirements and try.) This will give a lower bound to minimization problems. There are usually a few so-called complicating constraints that make integer solutions very difficult to find. One solution technique is to relax those by throwing them to the objective function, and solving the simpler problem (the Lagrangian Relaxation problem.)

Now, if the solution to the Lagrangian doesn't change whether or not you have integrality (as is the case with Assignment Problems) then you always get integer solutions.

For those who are really interested in learning about Integrality and Lagrangian duals, this reference is quite readable. The example in Section 3 specifically covers the Assignment Problem.

Free MILP Solvers: This SO question is quite exhaustive for that.

Hope that helps.

You have to take care of the fact that the wish lists for the two exams usually differ. So one would have variables X_spe if student s takes exam e with professor p. Of course, it's still easy to formulate this as an ILP problem. However, I don't get your point why an LP solution should automatically be integral. Can you explain this in more detail? Anyway, what would be a good (I)LP solver which is available for free? — pgraf, Jul 06 '13 at 17:35
Now I plugged it into lp_solve, and it works great. The integrality of solutions seems to stem from the fact that the matrix is totally unimodular. Unfortunately, I cannot upvote your answer because my reputation is too low ... — pgraf, Jul 10 '13 at 13:27

Peter de Rivaz · Answer 2 · 2013-07-07T18:56:49.583

I think you can solve this optimally by solving a min cost flow problem.

The Python code below illustrates how this can be done for 3 students using the networkx library.

import networkx as nx

G=nx.DiGraph()

multiple_prefs=[{'Tom':['Prof1','Prof2','Prof3'],
       'Dick':['Prof2','Prof1','Prof3'],
       'Harry':['Prof1','Prof3','Prof1']},

       {'Tom':['Prof2','Prof1','Prof3'],
       'Dick':['Prof2','Prof1','Prof3'],
       'Harry':['Prof1','Prof3','Prof1']}
       ]

workload={'Prof1':2,'Prof2':10,'Prof3':4}

num_tests=len(multiple_prefs)
num_students=len(multiple_prefs[0])
G.add_node('dest',demand=num_tests*num_students)
A=[]
for i,prefs in enumerate(multiple_prefs):
    testname='_test%d'%i
    for student,proflist in prefs.items():
        node=student+testname
        A.append(node)
        G.add_node(node,demand=-1)
        for i,prof in enumerate(proflist):
            if i==0:
                cost=0 # happy to assign first choices
            elif i==1:
                cost=1 # Slightly unhappy to assign second choices
            else:
                cost=4 # Very unhappy to assign third choices
            n2=prof+'_with_'+student
            n3=prof
            G.add_edge(node,n2,capacity=1,weight=cost) # Edge taken if student takes this test with the professor
            G.add_edge(n2,n3,capacity=1,weight=0) # Capacity stops student taking both tests with the same professor
            G.add_edge(n3,'dest',capacity=workload[prof],weight=0) # Capacity stops professor exceeding his workload


flowdict = nx.min_cost_flow(G)
for student in A:
    for prof,flow in flowdict[student].items():
        if flow:
            print student,'with',prof

The key is that we have a separate node (called n2 in the code) for each combination of student and professor. By giving n2 a capacity of 1 to the destination, we ensure that each student can only take a single test with that professor.

The dictionary multiple_prefs stores two dictionaries. The first dictionary contains the lists of preferences for each student for the first test. The second has the preferences for the second test.

EDIT

The code now also includes nodes n3 for each professor. The capacity on the edges between these nodes and the dest will limit the maximum workload for each professor.

The dictionary workload stores the maximum number of allowed tests for each professor.

Two Questions: 1. Is it clear that nx.min_cost_flow() produces an integer solution? Clearly a solution with fractional values is no good. 2. How do you ensure that no professor exceeds his workload? In your code, I cannot see where the professors' capacities enter. — pgraf, Jul 05 '13 at 15:58
1) I believe the flow will always be integer because all the capacities are integer. 2) Sorry, I had missed this part of the problem. However, it is easy to fix by simply adding additional nodes n3 for each professor. Each n2 node will connect to the corresponding n3 with capacity 1, and then the n3 nodes will connect to the dest with capacity corresponding to that professors workload. I'll edit the answer to add the corresponding code when I am next at a computer with Python installed... — Peter de Rivaz, Jul 05 '13 at 16:42
Adding the n3s is a good point. I'm looking forward to your code. However, I'm still worried about integrality. When I look at the source code of networkx, it seems it's using some kind of simplex algorithm. One would have to test some real-world examples. — pgraf, Jul 06 '13 at 17:44
It works, but at least with interpreted Python it's painfully slow. Therefore I accepted Ram's answer instead. Mathematically, it boils down to the same thing. I would upvote your answer, but my reputation is not sufficient. — pgraf, Jul 10 '13 at 13:30

Computing assignment with additional constraints

2 Answers2