3

I have a number of players that are split into two groups, good players and bad players. Each player is defined by a an arithmetic value and a possible group could be this one:

bad players
A 13
B 6
C 9
D 2

good players
X 25
Y 16
Z 17
K 10

Pairing a good player with a bad player will cause him to be worse by an amount equal to the bad players' value (so if I pair a good player with value 100 with a bad player of value 50, then the good players value drops to 50). I need to pair each good player with a bad player, but in such a way that the sum of the good players in the resulting list can be split into two groups of equal size that have the same sum.

In my example above a bad pairing would have been:

A - X 12
C - Z 8
B - Y 10
D - K 8

Now because of their pairing with bad players (A B C D) all the good players have lost some points and their value has changed. These new values cant be split into two groups of equal size so as the sum of the values in each group is equal. If I had a different combination though, lets say:

D - K 8
B - Z 11
C - X 16
A - Y 3 

I could split the good players now into the groups K, Z and X, Y that both have a value of 19 and so things remain neutral.

A solution in order to find an acceptable pairing of good and bad players would have been to use brute force and produce all the possible combinations, and check for each one if it is acceptable or not by trying to split it into two groups of equal value. Once the first acceptable solution is found we return it.

How can I improve this and directly find an acceptable pairing in this setting?

dearn44
  • 3,198
  • 4
  • 30
  • 63
  • Isn't this very close to the knapsack problem? I'm not sure how much better than what dynamic programming gives you you can get. What amount of pairs do you need to support? – kabanus Jun 30 '18 at 19:11
  • Fairly small, I would assume not more than a few tens of players. I would say its more like tha partition problem, but it has to be done for a number of different combinations, so I am not sure how to keep it optimal. – dearn44 Jun 30 '18 at 19:18
  • 2
    A few questions, does each good player play exactly one bad player, and each bad player exactly one good player (no player plays twice)? Does the resulting partitioning need to be of equal cardinality (can one partition have 3 pairs and another have, say, 1)? – jedwards Jun 30 '18 at 19:39
  • Imagine that they are paired as teammates, so the groups of good and bad players are equal in numbers. Also no, the resulting partitioning needs to be able to split in two sets of equal value and equal size, since anything else would mean that the teams are uneven. – dearn44 Jun 30 '18 at 19:44
  • You said "Each player is defined by two numbers", but you only give one number per player. And what determines good vs bad? In particular, why is A bad, and K good? – PM 2Ring Jun 30 '18 at 19:45
  • My bad, in reality yes, but for us it is sufficient to take into account only one. Im fixing the mistake in the original question. We just know that a player is considered good or not. So they come to us in those two groups and we dont really care how they got there. – dearn44 Jun 30 '18 at 20:00
  • 1
    Can you put your code here so we can see desired output. I don't understand your problem/ question. – Siddharth Chabra Jun 30 '18 at 20:13
  • @SiddharthChabra edited the question. Essentially what we need is a combination of pairings that will alter the value of the good players in such a way that will allow us to be able to split them into the two groups with the properties I describe. Yes I could prepare a minimal example of this and post it tomorrow (it's quite late here now) – dearn44 Jun 30 '18 at 20:17
  • Seems like the best you're gonna get is brute force, or at least some optimized form of it using perhaps numpy. You can do some checks before attempting to brute force it though. For example, if the sum of the good players minus the sum of the bad players isn't even, then a equal split won't be possible. – RobertR Jun 30 '18 at 20:34
  • 1
    Sorry I haven't been able to do more work on this, but I just added a little more info to the end of my answer. – PM 2Ring Jul 01 '18 at 21:27

2 Answers2

4

Here's a reasonably efficient algorithm. It's ok on small data sets, but it will take time & RAM for larger data sets. I won't claim it's the best algorithm, but it's certainly much better than brute-force checking each good - bad pairing.

The key insight is that we don't need to look at the individual good - bad pairings, we can look at the total good - bad score of whole teams. Let the scores for two valid teams be (X, A) and (Y, B). That is, X is the total score of all the good players in team 1 and A is the total score of all the bad players in team 1. Similarly, Y is the score of the good players in team 2 and B is the score of the bad players in that team. Then

X - A = Y - B
X + B = Y + A
2(X + B) = X + B + Y + A
X + B = (X + B + Y + A) / 2
Let M = (X + Y + A + B) / 2
Thus B = M - X

So we just need to calculate M, the total of all players' scores, and then for each good player partition X we need to see if there's a matching bad player partition B. (Incidentally, the arithmetic above shows that X + Y + A + B must be even for a solution to exist, since B & X must be whole numbers).

To do that with reasonable efficiency, we create a dict of the bad player partitions. The dict key is the total score for that partition, the associated value is a list of all bad player partitions with that score.

Then we loop over the good player partition pairs. We find the total score for the 1st partition in a pair, subtract it from M to get the required bad score, and see if that bad score's in the dictionary. If it is, we've found a solution.

The function equal_parts is the fastest way I know in pure Python to split a list into all of its equal length partitions, see this answer I wrote a couple of years ago for details.

The code below finds a single solution for the data given in the question; that solution yields 2 options for each team. I've included some extra code (currently commented out) to generate some random fake data in order to test the code on a larger data set.

Update

There was a minor bug in the logic of the previous version. When I found a list of bad partitions with the correct total score b1 I used bad_total - b1 to get the complementary list of bad partitions for the other team. But that doesn't always work correctly. :oops: The new version uses a bad_parts dictionary that stores each bad partition both as a key and a value, which makes it easy to get the correct complementary partition.

Also, the old make_pairs function was a bit sloppy, so it could produce Team 1 lists and Team 2 lists that shared players. :other oops:. We now have a new function make_teams that calls a repaired make_pairs. The make_teams function prints the various Team 1 and Team 2 permutations in sections. Within each section no Team 1 list will share players with a Team 2 list.

I've also made a few other minor changes, the main one being that the names in a partition tuple are always sorted alphabetically. That makes their use as dictionary keys more robust. I've also tweaked the output format a little.

It may sometimes happen (eg in Solution 23 of the data you supplied in a comment) that an alleged Solution doesn't work: there's no permutation of bad players that can be combined with good players without generating a negative score. Hopefully, that's not a major problem. To make the code deal with that automatically we'd need to change it so that instead of printing results as it goes it would need to store the results in lists (or perhaps use generators), and then test that the resulting lists aren't empty before printing that solution. This would consume a lot more RAM, and would make the logic even more cluttered than it currently is.

from itertools import combinations, permutations
from collections import defaultdict
from random import seed, randrange

seed(37)

def equal_parts(lst):
    ''' yield all equal-sized pair partitions of lst '''
    first = lst[0]
    allset_diff = set(lst).difference
    for left in combinations(lst, len(lst) // 2):
        if left[0] != first:
            break
        yield left, tuple(sorted(allset_diff(left)))

def find_partitions(bad, good):
    print('bad', bad, len(bad))
    print('good', good, len(good))

    bad_total = sum(bad.values())
    good_total = sum(good.values())
    total = bad_total + good_total
    if total % 2 != 0:
        print('No solutions!')
        return

    magic = total // 2
    print('magic =', magic)

    # Create a dict of the good partition pairs
    good_parts = dict(equal_parts(sorted(good)))
    # Create a dict of the bad partition pairs, with each partition of 
    # the pair as a key and the complementary partiton as the value
    bad_parts = {}
    for bpart1, bpart2 in equal_parts(sorted(bad)):
        bad_parts[bpart1] = bpart2
        bad_parts[bpart2] = bpart1
    #print(bad_parts)

    # Get the sums of all the bad partitions, and save them in 
    # a dict of lists with the partition sum as the key and 
    # the partition in the value list
    bad_sums = defaultdict(list)
    for bpart in bad_parts:
        s = sum(bad[k] for k in bpart)
        bad_sums[s].append(bpart)
    bad_sums = dict(bad_sums)
    #print(bad_sums)

    # Sum the 1st of each pair of good partitions & see if there's a matching bad partition 
    count = 0
    for gpart1, gpart2 in good_parts.items():
        g1 = sum(good[k] for k in gpart1)
        b1 = magic - g1
        if b1 in bad_sums:
            count += 1
            print('SOLUTION', count)
            g2 = good_total - g1
            b2 = bad_total - b1
            blist1 = bad_sums[b1]
            # Create the complementary list of bad partitions
            blist2 = [bad_parts[k] for k in blist1]
            tot1 = g1 - b2
            tot2 = g2 - b1
            print(gpart1, g1, '-', blist2, b2, '=', tot1)
            print(gpart2, g2, '-', blist1, b1, '=', tot2, '\n')
            make_teams(gpart1, gpart2, blist1, blist2)

def make_pairs(gpart, bpart):
    for b in permutations(bpart):
        total = 0
        team = []
        for gkey, bkey in zip(gpart, b):
            score = good[gkey] - bad[bkey]
            if score <= 0:
                # Invalid pairing
                break
            total += score
            team.append('{}-{}={}'.format(gkey, bkey, score))
        else:
            print(', '.join(team), total)

def make_teams(gpart1, gpart2, blist1, blist2):
    section = 0
    for bpart2, bpart1 in zip(blist2, blist1):
        section += 1
        print('Section', section)
        print('Team 1:', ' '.join(gpart1), '+', ' '.join(bpart2))
        make_pairs(gpart1, bpart2)
        print('\nTeam 2:', ' '.join(gpart2), '+', ' '.join(bpart1))
        make_pairs(gpart2, bpart1)
        print()

# Make some fake data
def make_data(letters, lo, hi):
    return {s: randrange(lo, hi) for s in letters}

#while True:
    #bad = make_data('ZYXWVU', 1, 15)
    #good = make_data('ABCDEF', 10, 30)
    #bad_total = sum(bad.values())
    #good_total = sum(good.values())
    #if bad_total % 2 == good_total % 2:
        #break

bad = {'A': 13, 'B': 6, 'C': 9, 'D': 2,}
good = {'X': 25, 'Y': 16, 'Z': 17, 'K': 10,}

#bad = {'bA': 7, 'bB': 10, 'bC': 2, 'bD': 12, 'bE': 15, 'bF': 14, 'bG': 17, 'bH': 15} 
#good = {'gA': 19, 'gB': 36, 'gC': 9, 'gD': 15, 'gE': 24, 'gF': 23, 'gG': 24, 'gH': 24}

find_partitions(bad, good)

output

bad {'A': 13, 'B': 6, 'C': 9, 'D': 2} 4
good {'X': 25, 'Y': 16, 'Z': 17, 'K': 10} 4
magic = 49
SOLUTION 1
('K', 'Z') 27 - [('B', 'D')] 8 = 19
('X', 'Y') 41 - [('A', 'C')] 22 = 19 

Section 1
Team 1: K Z + B D
K-B=4, Z-D=15 19
K-D=8, Z-B=11 19

Team 2: X Y + A C
X-A=12, Y-C=7 19
X-C=16, Y-A=3 19

Here's the output from the fake data.

bad {'Z': 2, 'Y': 9, 'X': 5, 'W': 6, 'V': 11, 'U': 12} 6
good {'A': 23, 'B': 28, 'C': 28, 'D': 21, 'E': 28, 'F': 11} 6
magic = 92
SOLUTION 1
('A', 'B', 'C') 79 - [('U', 'V', 'Y')] 32 = 47
('D', 'E', 'F') 60 - [('W', 'X', 'Z')] 13 = 47 

Section 1
Team 1: A B C + U V Y
A-U=11, B-V=17, C-Y=19 47
A-U=11, B-Y=19, C-V=17 47
A-V=12, B-U=16, C-Y=19 47
A-V=12, B-Y=19, C-U=16 47
A-Y=14, B-U=16, C-V=17 47
A-Y=14, B-V=17, C-U=16 47

Team 2: D E F + W X Z
D-W=15, E-X=23, F-Z=9 47
D-W=15, E-Z=26, F-X=6 47
D-X=16, E-W=22, F-Z=9 47
D-X=16, E-Z=26, F-W=5 47
D-Z=19, E-W=22, F-X=6 47
D-Z=19, E-X=23, F-W=5 47

SOLUTION 2
('A', 'B', 'D') 72 - [('U', 'V', 'Z'), ('V', 'X', 'Y')] 25 = 47
('C', 'E', 'F') 67 - [('W', 'X', 'Y'), ('U', 'W', 'Z')] 20 = 47 

Section 1
Team 1: A B D + U V Z
A-U=11, B-V=17, D-Z=19 47
A-U=11, B-Z=26, D-V=10 47
A-V=12, B-U=16, D-Z=19 47
A-V=12, B-Z=26, D-U=9 47
A-Z=21, B-U=16, D-V=10 47
A-Z=21, B-V=17, D-U=9 47

Team 2: C E F + W X Y
C-W=22, E-X=23, F-Y=2 47
C-W=22, E-Y=19, F-X=6 47
C-X=23, E-W=22, F-Y=2 47
C-X=23, E-Y=19, F-W=5 47
C-Y=19, E-W=22, F-X=6 47
C-Y=19, E-X=23, F-W=5 47

Section 2
Team 1: A B D + V X Y
A-V=12, B-X=23, D-Y=12 47
A-V=12, B-Y=19, D-X=16 47
A-X=18, B-V=17, D-Y=12 47
A-X=18, B-Y=19, D-V=10 47
A-Y=14, B-V=17, D-X=16 47
A-Y=14, B-X=23, D-V=10 47

Team 2: C E F + U W Z
C-U=16, E-W=22, F-Z=9 47
C-U=16, E-Z=26, F-W=5 47
C-W=22, E-U=16, F-Z=9 47
C-Z=26, E-U=16, F-W=5 47

SOLUTION 3
('A', 'B', 'E') 79 - [('U', 'V', 'Y')] 32 = 47
('C', 'D', 'F') 60 - [('W', 'X', 'Z')] 13 = 47 

Section 1
Team 1: A B E + U V Y
A-U=11, B-V=17, E-Y=19 47
A-U=11, B-Y=19, E-V=17 47
A-V=12, B-U=16, E-Y=19 47
A-V=12, B-Y=19, E-U=16 47
A-Y=14, B-U=16, E-V=17 47
A-Y=14, B-V=17, E-U=16 47

Team 2: C D F + W X Z
C-W=22, D-X=16, F-Z=9 47
C-W=22, D-Z=19, F-X=6 47
C-X=23, D-W=15, F-Z=9 47
C-X=23, D-Z=19, F-W=5 47
C-Z=26, D-W=15, F-X=6 47
C-Z=26, D-X=16, F-W=5 47

SOLUTION 4
('A', 'C', 'D') 72 - [('U', 'V', 'Z'), ('V', 'X', 'Y')] 25 = 47
('B', 'E', 'F') 67 - [('W', 'X', 'Y'), ('U', 'W', 'Z')] 20 = 47 

Section 1
Team 1: A C D + U V Z
A-U=11, C-V=17, D-Z=19 47
A-U=11, C-Z=26, D-V=10 47
A-V=12, C-U=16, D-Z=19 47
A-V=12, C-Z=26, D-U=9 47
A-Z=21, C-U=16, D-V=10 47
A-Z=21, C-V=17, D-U=9 47

Team 2: B E F + W X Y
B-W=22, E-X=23, F-Y=2 47
B-W=22, E-Y=19, F-X=6 47
B-X=23, E-W=22, F-Y=2 47
B-X=23, E-Y=19, F-W=5 47
B-Y=19, E-W=22, F-X=6 47
B-Y=19, E-X=23, F-W=5 47

Section 2
Team 1: A C D + V X Y
A-V=12, C-X=23, D-Y=12 47
A-V=12, C-Y=19, D-X=16 47
A-X=18, C-V=17, D-Y=12 47
A-X=18, C-Y=19, D-V=10 47
A-Y=14, C-V=17, D-X=16 47
A-Y=14, C-X=23, D-V=10 47

Team 2: B E F + U W Z
B-U=16, E-W=22, F-Z=9 47
B-U=16, E-Z=26, F-W=5 47
B-W=22, E-U=16, F-Z=9 47
B-Z=26, E-U=16, F-W=5 47

SOLUTION 5
('A', 'C', 'E') 79 - [('U', 'V', 'Y')] 32 = 47
('B', 'D', 'F') 60 - [('W', 'X', 'Z')] 13 = 47 

Section 1
Team 1: A C E + U V Y
A-U=11, C-V=17, E-Y=19 47
A-U=11, C-Y=19, E-V=17 47
A-V=12, C-U=16, E-Y=19 47
A-V=12, C-Y=19, E-U=16 47
A-Y=14, C-U=16, E-V=17 47
A-Y=14, C-V=17, E-U=16 47

Team 2: B D F + W X Z
B-W=22, D-X=16, F-Z=9 47
B-W=22, D-Z=19, F-X=6 47
B-X=23, D-W=15, F-Z=9 47
B-X=23, D-Z=19, F-W=5 47
B-Z=26, D-W=15, F-X=6 47
B-Z=26, D-X=16, F-W=5 47

SOLUTION 6
('A', 'D', 'E') 72 - [('U', 'V', 'Z'), ('V', 'X', 'Y')] 25 = 47
('B', 'C', 'F') 67 - [('W', 'X', 'Y'), ('U', 'W', 'Z')] 20 = 47 

Section 1
Team 1: A D E + U V Z
A-U=11, D-V=10, E-Z=26 47
A-U=11, D-Z=19, E-V=17 47
A-V=12, D-U=9, E-Z=26 47
A-V=12, D-Z=19, E-U=16 47
A-Z=21, D-U=9, E-V=17 47
A-Z=21, D-V=10, E-U=16 47

Team 2: B C F + W X Y
B-W=22, C-X=23, F-Y=2 47
B-W=22, C-Y=19, F-X=6 47
B-X=23, C-W=22, F-Y=2 47
B-X=23, C-Y=19, F-W=5 47
B-Y=19, C-W=22, F-X=6 47
B-Y=19, C-X=23, F-W=5 47

Section 2
Team 1: A D E + V X Y
A-V=12, D-X=16, E-Y=19 47
A-V=12, D-Y=12, E-X=23 47
A-X=18, D-V=10, E-Y=19 47
A-X=18, D-Y=12, E-V=17 47
A-Y=14, D-V=10, E-X=23 47
A-Y=14, D-X=16, E-V=17 47

Team 2: B C F + U W Z
B-U=16, C-W=22, F-Z=9 47
B-U=16, C-Z=26, F-W=5 47
B-W=22, C-U=16, F-Z=9 47
B-Z=26, C-U=16, F-W=5 47   

PS. I've thought of a way to handle much larger data sets quickly. It won't find all solutions, but it should find plenty of them. The idea is to divide the large data set into several smaller sets, solve those smaller sets independently, and then combine the solutions. The only tricky part is to do the division so that each smaller set is solvable. So in each set bad_total < good_total and bad_total%2 == good_total%2. It probably makes sense to sort the full data sets by score before attempting the division so that the bad & good groups of each smaller set are roughly matched.

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • I am not sure how splitting the data set would work though. Wouldnt that mean we are loosing a lot of possible combinations? Granted we might be interested for the first correct combination, or a reasonably enough large set of combinations, but could't it happen that the only possible solution to a particular instance is lost due to our splitting of the data? – dearn44 Jul 02 '18 at 14:23
  • 1
    @dearn44 Yes, we would lose many possible combinations. But it looks like we get lots of solutions if the set size is large enough because the number of partitions of 2n items is 2nCn, e.g. if 2n=8 we get 70 partitions (see https://oeis.org/A000984). So we should make the subgroups as large as is practical. The total number of solutions produced this way (in terms of the Solution count that my code prints) is the product of the numbers of solutions for each subgroup times `2**(m-1)` where m is the number of subgroups. And each solution gives multiple team options. – PM 2Ring Jul 02 '18 at 15:17
  • @dearn44 A simple possible optimization is to make the scores in a group unique, i.e. if 2 or more good players have the same score we only need to do the initial calculations for one of them, we can add the others back in during the make_pairs phase. Of course, we can only do this if there are dupes in both the good & the bad players, since the group sizes have to remain equal. – PM 2Ring Jul 02 '18 at 15:26
  • the optimization for large data sets seems indeed interesting and I should try it out. I am not sure how much we will earn from your second suggestion though. Ill try to find out. I noticed a small issue though, sometimes the combinations of team1 and team2 might pair the same player two times with two different teammates while leaving others without a team. So another check must be added making sure that the number of unique players in each possible pairing is similar to the number of total players I think. – dearn44 Jul 03 '18 at 08:50
  • As an example have a look at **Solution 23* for this test case `bad = {'bA': 7, 'bB': 10, 'bC': 2, 'bD': 12, 'bE': 15, 'bF': 14, 'bG': 17, 'bH': 15}`, `good = {'gA': 19, 'gB': 36, 'gC': 9, 'gD': 15, 'gE': 24, 'gF': 23, 'gG': 24, 'gH': 24}`. We find *bA* both in team 1 and team two, so there could be a chance that a proposed pairing has the same player in two teams. What I thought about doing is in `make_pairs` to also make sure that I have the correct amount of unique players for each combination to be valid. Maybe there is a better way, I'll see if I can find it. – dearn44 Jul 03 '18 at 11:01
  • Nope, no bugs this time :D – dearn44 Jul 04 '18 at 10:24
  • @dearn44 Phew! :) If you want to do more with this, feel free to ask a new question, possibly linking back to this one. And ping me here so I'll find out about it. Also drop by [the SO Python chat room](https://chat.stackoverflow.com/rooms/6/python) some time. – PM 2Ring Jul 04 '18 at 11:21
  • Definitely, Im full of ideas, if anything comes up I'll post – dearn44 Jul 04 '18 at 11:23
3

The following code will produce a valid pairing partitioning:

from itertools import permutations

def find_pairing_partitioning(scores_g, scores_b):
    keys_g = sorted(scores_g.keys())
    keys_b = sorted(scores_b.keys())

    # Some initialization that we can do once
    all_sum = sum(scores_g.values()) - sum(scores_b.values())
    if all_sum % 2 == 1:
        raise RuntimeError("Can't evenly partition an odd sum")

    bin_siz = len(scores_g) // 2                    # How many elements we want in each bin

    for perm_b in permutations(keys_b):
        diff = {}
        for (kg,kb) in zip(keys_g, perm_b):
            kr = '%s-%s' % (kg,kb)
            diff[kr] = scores_g[kg] - scores_b[kb]

        for perm_d in permutations(diff.keys()):    # Note: Inefficient
            bin1_keys = perm_d[:bin_siz]
            bin2_keys = perm_d[bin_siz:]
            bin1_sum  = sum(diff[k] for k in bin1_keys)
            bin2_sum  = sum(diff[k] for k in bin2_keys)

            if bin1_sum == bin2_sum:
                yield bin1_keys, bin2_keys

scores_g = {'X':25, 'Y':16, 'Z':17, 'K':10}     # Scores, good players
scores_b = {'A':13, 'B':6,  'C':9,  'D':2}      # Scores, bad players

bin1_keys, bin2_keys = next(find_pairing_partitioning(scores_g, scores_b))
print("Found even partitioning:", bin1_keys, "vs", bin2_keys)

Example output: Found even partitioning: ('K-B', 'Z-D') vs ('X-A', 'Y-C')

The inner permutations call is inefficient, because it tests what are effectively the same partitionings multiple times, for example:

    A,B  vs.  C,D
    B,A  vs.  C,D
    C,D  vs.  A,B
    ...

So you might look into cleaning that up, or trade runtime efficiency for space efficiency and do something like this.

jedwards
  • 29,432
  • 3
  • 65
  • 92