1

I'm basically trying to do this Monte Carlo kind of analysis where I randomly reassign the participants in my experiment to new groups, and then reanalyze the data given the random new groups. So here's what I want to do:

Participants are originally grouped into eight groups of four participants each. I want to randomly reassign each participant to a new group, but I don't want any participants to end up in a new group with another participant from their same original group.

Here is how far I got with this:

import random
import pandas as pd
import itertools as it

data = list(it.product(range(8),range(4)))
test_df = pd.DataFrame(data=data,columns=['group','partid'])
test_df['new_group'] = None

for idx, row in test_df.iterrows():
    start_group = row['group']
    takens      = test_df.query('group == @start_group')['new_group'].values
    fulls       = test_df.groupby('new_group').count().query('partid >= 4').index.values
    possibles   = [x for x in test_df['group'].unique() if (x not in takens)
                                                      and (x not in fulls)]
    test_df.loc[idx,'new_group'] = random.choice(possibles)

The basic idea here is that I randomly reassign a participant to a new group with the constraints that (a) the new group doesn't have one of their original group partners in, and (b) the new group doesn't have 4 or more participants already reassigned to it.

The problem with this approach is that, many times, by the time we try to reassign the last group, the only remaining group slots are in that same group. I could also just try to re-randomize when it fails until it succeeds, but that feels silly. Also, I want to make 100 random reassignments, so that approach could get very slow....

So there must be a smarter way to do this. I also feel like there should be a simpler way to solve this, given how simple the goal feels (but I realize that can be misleading...)

Russell Richie
  • 421
  • 1
  • 5
  • 15

1 Answers1

1

EDIT: Better solution


After sleeping on it I've found a significantly better solution that's in ~ Big O of numGroups.

Sample data

import random
import numpy as np
import pandas as pd
import itertools as it

np.random.seed(0)
numGroups=4
numMembers=4

data = list(it.product(range(numGroups),range(numMembers)))
df = pd.DataFrame(data=data,columns=['group','partid'])

Solution

g = np.repeat(range(numGroups),numMembers).reshape((numGroups,numMembers))
In [95]: g
Out[95]: 
array([[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2],
       [3, 3, 3, 3]])

g = np.random.permutation(g)
In [102]: g
Out[102]: 
array([[2, 2, 2, 2],
       [3, 3, 3, 3],
       [1, 1, 1, 1],
       [0, 0, 0, 0]])

g = np.tile(g,(2,1))
In [104]: g
Out[104]: 
array([[2, 2, 2, 2],
       [3, 3, 3, 3],
       [1, 1, 1, 1],
       [0, 0, 0, 0],
       [2, 2, 2, 2],
       [3, 3, 3, 3],
       [1, 1, 1, 1],
       [0, 0, 0, 0]])

Notice the diagonals.

array([[2, -, -, -],
       [3, 3, -, -],
       [1, 1, 1, -],
       [0, 0, 0, 0],
       [-, 2, 2, 2],
       [-, -, 3, 3],
       [-, -, -, 1],
       [-, -, -, -]])

Take the diagonals from top to bottom.

newGroups = []
for i in range(numGroups):
    newGroups.append(np.diagonal(g[i:i+numMembers]))

In [106]: newGroups
Out[106]: 
[array([2, 3, 1, 0]),
 array([3, 1, 0, 2]),
 array([1, 0, 2, 3]),
 array([0, 2, 3, 1])]

newGroups = np.ravel(newGroups)
df["newGroups"] = newGroups

In [110]: df
Out[110]: 
    group  partid  newGroups
0       0       0          2
1       0       1          3
2       0       2          1
3       0       3          0
4       1       0          3
5       1       1          1
6       1       2          0
7       1       3          2
8       2       0          1
9       2       1          0
10      2       2          2
11      2       3          3
12      3       0          0
13      3       1          2
14      3       2          3
15      3       3          1

Old Solution: Brute Force Method


Turned out to be a lot harder than I thought...

I have a brute force method that basically guesses different permutations of groups until it finally gets one where everyone ends up in a different group. The benefit of this approach vs. what you've shown is that it doesn't suffer from "running out of groups at the end".

It can potentially get slow - but for 8 groups and 4 members per group it's fast.

Sample data

import random
import numpy as np
import pandas as pd
import itertools as it

random.seed(0)
numGroups=4
numMembers=4

data = list(it.product(range(numGroups),range(numMembers)))
df = pd.DataFrame(data=data,columns=['group','partid'])

Solution

g = np.repeat(range(numGroups),numMembers).reshape((numGroups,numMembers))

In [4]: g
Out[4]: 
array([[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2],
       [3, 3, 3, 3]])

def reArrange(g):
    g = np.transpose(g)
    g = [np.random.permutation(x) for x in g]
    return np.transpose(g)

# check to see if any members in each old group have duplicate new groups
# if so repeat
while np.any(np.apply_along_axis(lambda x: len(np.unique(x))<numMembers,1,g)):
    g = reArrange(g)

df["newGroup"] = g.ravel()

In [7]: df
Out[7]: 
    group  partid  newGroup
0       0       0         2
1       0       1         3
2       0       2         1
3       0       3         0
4       1       0         0
5       1       1         1
6       1       2         2
7       1       3         3
8       2       0         1
9       2       1         0
10      2       2         3
11      2       3         2
12      3       0         3
13      3       1         2
14      3       2         0
15      3       3         1
Filip Kilibarda
  • 2,484
  • 2
  • 20
  • 31
  • Can't one simply choose 4 random groups and select one random guy from each one of those to create a new group and so on? – Peaceful Mar 05 '17 at 10:00
  • @peaceful yes, then I found out that there's a non-zero probability that when you get to for final iteration, i.e., choose your final 4 groups, you're left with no choice but to choose members that were previously in a group together. After that I decided I'd just settle with the brute force method and wait for someone else to come along with an elegant solution. – Filip Kilibarda Mar 05 '17 at 17:41
  • In addition, you would have to ensure that when a member is randomly chosen, they're withdrawn from the pool of possible future choices. On top of dealing with the case I described above – Filip Kilibarda Mar 05 '17 at 17:44
  • Yes, I think I had pursued something like you suggest, where I iterated over each member of the experiment, and swapped them with another member, keeping track of who had yet to be swapped. It got complicated, so here we are. =) – Russell Richie Mar 05 '17 at 18:12
  • 1
    Yeah, it does get complicated when you try to do it analytically. Very interesting problem though. – Filip Kilibarda Mar 05 '17 at 18:23
  • 1
    Hey @RussellRichie, I believe I've found a significantly better solution that actually scales. Edited my answer, check it out. – Filip Kilibarda Mar 05 '17 at 19:55
  • Wow, very interesting. Will look at this more closely. I think `g = np.tile(g,(2,1))` can replace `g = np.tile(np.ravel(g),2).reshape((numGroups*2,numMembers))`? Might be other ways to simplify. Thanks! – Russell Richie Mar 05 '17 at 21:50