4

This is a complicated one, but I suspect there's some principle I can apply to make it simple - I just don't know what it is.

I need to parcel out presentation slots to a class full of students for the semester. There are multiple possible dates, and multiple presentation types. I conducted a survey where students could rank their interest in the different topics. What I'd like to do is get the best (or at least a good) distribution of presentation slots to students.

So, what I have:

  • List of 12 dates
  • List of 18 students
  • CSV file where each student (row) has a rating 1-5 for each date

What I'd like to get:

  • Each student should have one of presentation type A (intro), one of presentation type B (figures) and 3 of presentation type C (aims)
  • Each date should have at least 1 of each type of presentation
  • Each date should have no more than 2 of type A or type B
  • Try to give students presentations that they rated highly (4 or 5)

I should note that I realize this looks like a homework problem, but it's real life :-). I was thinking that I might make a Student class for each student that contains the dates for each presentation type, but I wasn't sure what the best way to populate it would be. Actually, I'm not even sure where to start.

Junuxx
  • 14,011
  • 5
  • 41
  • 71
kevbonham
  • 999
  • 7
  • 24
  • Show us what you've tried, and we can help you out from there – inspectorG4dget Jan 25 '16 at 20:49
  • Is there a limited number of type C presentation slots in a day? And do students have to give their A, B, C presentations in order? – Junuxx Jan 25 '16 at 20:51
  • Does "no more than 2 of type A or type B" mean `(A+B) <= 2`, or `A =< 2 && B<= 2`? – Junuxx Jan 25 '16 at 20:55
  • @inspectorG4dget I've gotten as far as `class Student(object):` and importing the csv with survey responses via pandas, but I really have no idea where to start. Sorry - this may be too amorphous a question for this forum :-/ – kevbonham Jan 25 '16 at 22:19
  • @Junuxx No limit to C in a day (though ideally they'd be somewhat uniformly spread out) and no, order for each student is not important. – kevbonham Jan 25 '16 at 22:20
  • @Junuxx re: your second question, `A <= 2 && B <= 2`... ideally `(A+B) <= 3` – kevbonham Jan 25 '16 at 22:22

1 Answers1

2

TL;DR: I think you're giving your students too much choice :D

But I had a shot at this problem anyway. Pretty fun exercise actually, although some of the constraints were a little vague. Most of all, I had to guess what the actual students' preference distribution would look like. I went with uniformly distributed, independent variables, although that's probably not very realistic. Still I think it should work just as well on real data as it does on my randomly generated data.

I considered brute forcing it, but a rough analysis gave me an estimate of over 10^65 possible configurations. That's kind of a lot. And since we don't have a trillion trillion years to consider all of them, we'll need a heuristic approach.

Because of the size of the problem, I tried to avoid doing any backtracking. But this meant that you could get stuck; there might not be a solution where everyone only gets dates they gave 4's and 5's.

I ended up implementing a double-edged Iterative Deepening-like search, where both the best case we're still holding out hope for (i.e., assign students to a date they gave a 5) and the worst case scenario we're willing to accept (some student might have to live with a 3) are gradually lowered until a solution is found. If we get stuck, reset, lower expectations, and try again. Tasks A and B are assigned first, and C is done only after A and B are complete, because the constraints on C are far less stringent.

I also used a weighting factor to model the trade off between maximizing students happiness with satisfying the types-of-presentations-per-day limits.

Currently it seems to find a solution for pretty much every random generated set of preferences. I included an evaluation metric; the ratio between the sum of the preference values of all assigned student/date combos, and the sum of all student ideal/top 3 preference values. For example, if student X had two fives, one four and the rest threes on his list, and is assigned to one of his fives and two threes, he gets 5+3+3=11 but could ideally have gotten 5+5+4=14; he is 11/14 = 78.6% satisfied.

After some testing, it seems that my implementation tends to produce an average student satisfaction of around 95%, at lot better than I expected :) But again, that is with fake data. Real preferences are probably more clumped, and harder to satisfy.

Below is the core of the algorihtm. The full script is ~250 lines and a bit too long for here I think. Check it out at Github.

...

# Assign a date for a given task to each student, 
# preferring a date that they like and is still free.
def fill(task, lowest_acceptable, spread_weight=0.1, tasks_to_spread="ABC"):
        random_order = range(nStudents) # randomize student order, so everyone
        random.shuffle(random_order)    # has an equal chance to get their first pick
        for i in random_order:
            student = students[i]
            if student.dates[task]: # student is already assigned for this task?
                continue

            # get available dates ordered by preference and how fully booked they are
            preferred = get_favorite_day(student, lowest_acceptable,  
                                         spread_weight, tasks_to_spread)
            for date_nr in preferred:
                date = dates[date_nr]
                if date.is_available(task, student.count, lowest_acceptable == 1):
                    date.set_student(task, student.count)
                    student.dates[task] = date
                    break

# attempt to "fill()" the schedule while gradually lowering expectations
start_at = 5
while start_at > 1:
    lowest_acceptable = start_at
    while lowest_acceptable > 0:
        fill("A", lowest_acceptable, spread_weight, "AAB")
        fill("B", lowest_acceptable, spread_weight, "ABB")
        if lowest_acceptable == 1:
            fill("C", lowest_acceptable, spread_weight_C, "C")
        lowest_acceptable -= 1

And here is an example result as printed by the script:

                                      Date                                      
================================================================================
Student |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10 |  11 |  12 |
================================================================================
   1    |     |  A  |  B  |     |  C  |     |     |     |     |     |     |     |
   2    |     |     |     |     |  A  |     |     |     |     |  B  |  C  |     |
   3    |     |     |     |     |  B  |     |     |  C  |     |  A  |     |     |
   4    |     |     |     |  A  |     |  C  |     |     |     |     |     |  B  |
   5    |     |     |  C  |     |     |     |  A  |  B  |     |     |     |     |
   6    |     |  C  |     |     |     |     |     |     |  A  |  B  |     |     |
   7    |     |     |  C  |     |     |     |     |  B  |     |     |     |  A  |
   8    |     |     |  A  |     |  C  |     |  B  |     |     |     |     |     |
   9    |  C  |     |     |     |     |     |     |     |  A  |     |     |  B  |
   10   |  A  |  B  |     |     |     |     |     |     |  C  |     |     |     |
   11   |  B  |     |     |  A  |     |  C  |     |     |     |     |     |     |
   12   |     |     |     |     |     |  A  |  C  |     |     |     |  B  |     |
   13   |  A  |     |     |  B  |     |     |     |     |     |     |     |  C  |
   14   |     |     |     |     |  B  |     |     |     |  C  |     |  A  |     |
   15   |     |     |  A  |  C  |     |  B  |     |     |     |     |     |     |
   16   |     |     |     |     |     |  A  |     |     |     |  C  |  B  |     |
   17   |     |  A  |     |  C  |     |     |  B  |     |     |     |     |     |
   18   |     |     |     |     |     |     |  C  |  A  |  B  |     |     |     |
================================================================================
Total student satisfaction: 250/261 = 95.00%
Junuxx
  • 14,011
  • 5
  • 41
  • 71
  • Wow, I was not expecting this level of input - amazing! I'm going to have to study it for a while to understand it I think. You get the check for sure regardless, but if you're willing I have a couple of follow-ups. First, to use my actual data, am I reading it right that I just need an ordered list of their preferences to load with `student.prefs = ordered_list`? Also, it looks like `Student.count` is the identifier for each instance of the `Student` class? (sorry if that's a naive question, I haven't worked with classes much. (continued below) – kevbonham Jan 27 '16 at 14:07
  • Finally (and sorry if this wasn't clear), each student actually needs to give a `C` presentation 3 times. Since the constraints are way lower, it will be trivial for me to fill it in by hand, so please don't put in a lot of effort to fix. I don't completely understand the central function, but at line 202 in the gist, could you call `fill("C", lowest_acceptable, spread_weight_C, "CCC")`, or call line 202 3 times? Actually, I think it's the `get_favorite_day` function that I'm having trouble groking. Regardless, thanks for your effort - I'm glad you had fun with it! – kevbonham Jan 27 '16 at 14:15
  • You're welcome. In my experience, StackOverflow is kind of hit or miss when you have a very specific question. But it might still be your best bet. Anyway, you're entirely right about `student.prefs`, their rated preference for each of the 12 days in order. And yes, `Student.count`, the class variable, increases by one every time the constructor is called, so that `self.count`, the instance variable, can be set to the correct number. – Junuxx Jan 27 '16 at 19:13
  • As for dealing with 3 C's, the easiest way to add that is to just have C1, C2, C3 tasks and dealing with them individually. I updated the gist accordingly. The `tasks_to_spread` variable in `fill` and `get_favorite_day` determines how much it avoids adding tasks to days that already have tasks of those types. "AAB" means it discount preexisting A's twice as much as B's. So "CCC" wouldn't really make a difference": Could have been clearer/less clunky I suppose :) Now I'm wondering though, what exactly do A, B and C stand for? – Junuxx Jan 27 '16 at 19:14
  • I had no idea it would be this involved to tell you the truth. The course is a graduate-level science literature review course linked to a seminar series. `A` and `B` are presentations on the background and main content of papers respectively. `C` is an assignment where students have to develop and present new experiments to extend the work of the paper we're reading. – kevbonham Jan 27 '16 at 21:26
  • fyi - I updated my version of [the gist](https://gist.github.com/kescobo/ad7f45eb5fb152d26dc4) to include some code to import a csv of survey values. I called the script from the command line `python2 class_projects.py /path/to/survey_results.csv`. – kevbonham Jan 28 '16 at 19:28
  • Just read this post (https://discourse.julialang.org/t/beginners-question-how-to-formulate-this-assignment-scheduling-problem/39681) on julia forum and remembered this answer... Still can't believe how much work you put into it! Wish I could give more points :-D – kevbonham May 19 '20 at 01:47