How to improve my team creation logic?

Question

I had one day to create a program that, when given a .csv file with the columns: [Name] | [Elo Rating], would output another .csv file with balanced teams. The user can choose how many players they want per team.

This is probably the programming project I enjoyed the most, but it is highly inefficient given the rushed time frame and my limited knowledge of how best to implement such a task. It is similar to this question, but I would prefer not to simplify the process into just placing the players into teams based on the rating order (faster, but less accurate than what I already have). Can you give suggestions on how I could make this run faster without sacrificing accuracy?

The logic I went with is this:

Find all the possible team combinations.
Find the average rating of all the team combinations & find the closest team rating to that average
Select the first team that has the average rating from step 2.
Remove all the teams that have any of these players in them.
Repeat #2, #3, and #4 until all the teams are made.

Step 1 was accomplished by this answer and works excellently.

Step 2 might be more of a math question. I'm not sure if the average of the player's ratings is equal to the average ratings of all the teams. I'm using a slightly modified version of this answer because I am using a dictionary to hold the name & rating. I wouldn't need this dictionary if just averaging the player's ratings was just as accurate.

float average = teamScoreDict.Average(s => s.Value);    

var closestScoreToAvg = teamScoreDict.Values.Aggregate((x, y) =>
Math.Abs(x - average) < Math.Abs(y - average) ? x : y);

Step 3

static Team FindTeam(float targetScore)
    {
        var selectedTeam = new Team();

        //grabbing the first team with the proper score and adding it to final teams (not perfect since there could be multiple teams with the same score)
        for (int i = 0; i < teams.Count; i++)
        {
            if (teams[i].TeamRating == targetScore)
            {
                selectedTeam = teams[i];

                //add these players to the list of players who have already been selected
                foreach (var player in teams[i].Player)
                    if (!usedPlayers.Contains(player.Key))
                        usedPlayers.Add(player.Key);

                //remove the score and team then break
                teamScoreDict.Remove(teams[i].Id);
                teams.Remove(teams[i]);
                break;
            }
        }

        return selectedTeam;
    }

Step 4 is what I believe to be the slowest part of this task. I thought removing the teams each iteration would make the subsequent team searches faster, which it does, but the removal process is slow.

static void RemoveTeamsWithUsedPlayers()
    {
        for (int i = teams.Count - 1; i >= 0; i--)
        {
            if (teams[i].Player.Any(kvp => usedPlayers.Contains(kvp.Key)))
            {
                teamScoreDict.Remove(teams[i].Id);
                teams.Remove(teams[i]);
            }
        }
    }

The results are excellent (My last run with 40 players into 5 man teams with elo ratings from 1300-2200 gave me 8 teams with only 19 point difference between the highest and lowest team's total score [8498 vs 8517]).

My problem is that it is extremely slow for larger team sizes. 12 players with 3 man teams is instant. 32 players in 4 man teams takes a few seconds. 40 players in 5 man teams takes many minutes because the possible K-combinations grow drastically and I'm looping through them so many times.

I hope this question is not too broad, and thank you for any suggestions.

EDIT: I went ahead and used the stopwatch class to get the times for each step as suggested. For 12 players with 3 per team (220 k-combinations).

00:00:00.0014493
00:00:00.0083637
00:00:00.0015608
00:00:00.0015930

There is also another step that I forgot. Between step 1 and step 2 I take the IEnumerable of all possible team combinations and put it into a class that I made, and calculate the total team score. This step takes 00:00:00.0042700

foreach (var team in teamList)
        {
            Team teamClass = new Team();
            teamScore = 0.0F;

            foreach (var player in team)
            {
                if (participants.TryGetValue(player, out tempInt))
                {
                    teamScore += tempInt;
                    teamClass.Player.Add(player, tempInt);
                }
            }

            teamClass.TeamRating = teamScore;
            teamClass.Id = count;
            teamScoreDict.Add(count, teamScore);
            teams.Add(teamClass);
            count++;
        }

EDIT 2 (Improved Logic): I'm certain I could still improve this a lot, but based on the answer I marked and the comments I was able to drastically speed things up while still maintaining accuracy. The 3 person teams with 12 players take about the same time, but 4 person teams with 32 players went from 4.5229176s to 0.4067160s. Here are the adjustments I made:

I use the average of just the players. It is the same as the average of all team combinations
I remove the highest rated player and then find the combinations with one less player than I normally would.
While I am placing all these teams into the class I will use, I simultaneously check for the (closest team rating + highest player rating) to the (overall average * players per team)
Then I add the highest player back to that team and remove all the players from the list
Repeat step 2 & 3 until everyone is used up

"Step 4 is what I believe to be the slowest part of this task." Well at least measure the different phases. Use StopWatch class — pm100, Dec 04 '17 at 18:49
my money is on phase 2 being the slow part. Anything that says 'find all combinations' is usually a sign of O(N*N) or close — pm100, Dec 04 '17 at 18:52
@pm100 It is actually `O(n^m)` where `n` is the number of players and `m` is the size of each team. — btilly, Dec 04 '17 at 19:48

btilly · Accepted Answer · 2017-12-05T00:32:13.303

For your average of averages side question, the answer is that you don't need your dictionary.

As for the algorithm, the easiest speedup is to say that the person with the highest rating will have to be on SOME team. So build only teams with that person, take the best. Then build teams again with some other person. And repeat. The result will be different than what you are doing currently because the person with the highest rating is not always on the team with closest to average rating, but the result is no less accurate. This will make your current O(n^m) algorithm for building a list of all possible teams into a O(n^(m-1)) algorithm.

But for a real speedup, what you need to do is follow the strategy in https://en.wikipedia.org/wiki/Subset_sum_problem#Pseudo-polynomial_time_dynamic_programming_solution and use dynamic programming to eliminate looking at teams whose scores will come out to be the same. That is, you build a data structure by number of players on the partial team so far, by total rating, the last player added. Now the second time you find, for instance, a combined score of 3500 for 3 players, you know that the teams that it leads to will have duplicate scores with what you already did, so you throw it out.

Throwing away all of those duplicates in the team generation process will reduce finding the next team from O(n^m) to O(n*m*k) where k is the range between the lowest and highest rating. And doing that n/m times will therefore take time O(n^2*k). Which should be acceptable with 5 person teams up to several hundred people.

So you build up this data structure, then sort the scores for a full team, and take the best one. Now walk the data structure backwards and you can find the selected team. Then throw it away and do this again.

This is very interesting. I still need to research it some more, but I wouldn't want to throw out teams with duplicate scores, right? If every team came out with the exact same score then that would be great. — cricket49, Dec 05 '17 at 17:56
@cricket49 While selecting one team, you don't want to enumerate all of the other teams that you could have equivalently selected. That said, I would suggest sorting your people from farthest away from average to closest. Then you'll be picking a team that is as close to average as possible...with as many extreme scores as you can find. This should improve your results. In fact insisting on the best team with the most extreme person is probably better still. — btilly, Dec 05 '17 at 18:10

How to improve my team creation logic?

1 Answers1