Block randomization using python?

Question

I have the following input table (df):

ColumnA	ColumnB	Blocks
A	12	1
B	32	1
C	44	1
D	76	2
E	99	2
F	123	2
G	65	2
H	87	3
I	76	3
J	231	3
k	80	4
l	55	4
m	27	5
n	67	5
o	34	5

I would like to perform block randomization such that, it pick one value from each blocks ( one value from 1,2,3,4,5) and create that as a separate table.

The output should look something like the following:

ColumnA	ColumnB	Blocks	Groups
B	32	1	A1
E	99	2	A1
I	76	3	A1
l	55	4	A1
m	27	5	A1
A	12	1	A2
F	123	2	A2
k	80	3	A2
m	27	4	A2
n	67	5	A2
C	44	1	A3
H	87	2	A3
J	231	3	A3
n	67	4	A3
o	34	5	A4
D	76	1	A4
G	65	2	A4

Randomly selected rows such that each group has all the blocks (evenly distributed).

What I tried so far?


df = df.groupby('blocks').apply(lambda x: x.sample(frac=1,random_state=1234)).reset_index(drop=True)
treatment_groups = [f"A{i}" for i in range(1, n+1)]
df['Groups'] = (df.index // n).map(dict(zip(idx, treatment_groups)))

This doesn't randomize according to the blocks column. How do I do that?

score 2 · Accepted Answer · answered Mar 23 '21 at 08:49

Let us try by defining a function to generate random samples from each block:

def random_samples(n):
    for i in range(1, n+1):
        for _, g in df.groupby('Blocks'):
            yield g.sample(n=1).assign(Groups=f'A{i}')

sampled = pd.concat(random_samples(4), ignore_index=True)

>>> sampled

   ColumnA  ColumnB  Blocks Groups
0        A       12       1     A1
1        D       76       2     A1
2        I       76       3     A1
3        k       80       4     A1
4        n       67       5     A1
5        C       44       1     A2
6        G       65       2     A2
7        J      231       3     A2
8        l       55       4     A2
9        m       27       5     A2
10       B       32       1     A3
11       G       65       2     A3
12       H       87       3     A3
13       l       55       4     A3
14       m       27       5     A3
15       B       32       1     A4
16       F      123       2     A4
17       I       76       3     A4
18       l       55       4     A4
19       m       27       5     A4

Thank you very much! But, I get the following error "ValueError: No objects to concatenate". How do I deal with that? — MuSu18, Mar 23 '21 at 08:55
Not sure may be something related to pandas version. What is your pandas version? Can you please check `sampled = pd.concat(list(random_samples(4)), ignore_index=True)` ? — Shubham Sharma, Mar 23 '21 at 08:58

score 0 · Answer 2 · answered Mar 23 '21 at 08:56

In your code .sample(frac = 1) returns 100% of samples which is all of them. You want the version with .sample(n=1), for example if you just want one group you would do

df.groupby('Blocks').apply(lambda x: x.sample(n=1, random_state=1234))

to get


        ColumnA ColumnB Blocks
Blocks              
1   0   A       12      1
2   3   D       76      2
3   7   H       87      3
4   10  k       80      4
5   12  m       27      5

to get, say, 5 groups you would use n=5 with replace=True (as some groups have fewer than 5 elements) and then some re-arranging:

df = df.groupby('Blocks').apply(lambda x: x.sample(n=5, replace = True, random_state=1234)).reset_index(drop = True)
df['Groups'] = 'A' + df.groupby('Blocks').cumcount().astype(str)
df.sort_values('Groups')

produces

    ColumnA      ColumnB    Blocks  Groups
--  ---------  ---------  --------  --------
 0  C                 44         1  A0
20  o                 34         5  A0
 5  G                 65         2  A0
15  l                 55         4  A0
10  J                231         3  A0
 1  B                 32         1  A1
21  n                 67         5  A1
 6  G                 65         2  A1
16  l                 55         4  A1
11  I                 76         3  A1
22  m                 27         5  A2
17  k                 80         4  A2
12  H                 87         3  A2
 7  F                123         2  A2
 2  A                 12         1  A2
13  H                 87         3  A3
 8  E                 99         2  A3
18  l                 55         4  A3
 3  A                 12         1  A3
23  m                 27         5  A3
14  H                 87         3  A4
 9  D                 76         2  A4
19  k                 80         4  A4
 4  A                 12         1  A4
24  m                 27         5  A4

Thank you very much! Is it possible to start the groups from A1 and not A0? — MuSu18, Mar 23 '21 at 09:03
of course, replace the relevant line with `df['Groups'] = 'A' + (df.groupby('Blocks').cumcount()+1).astype(str)` — piterbarg, Mar 23 '21 at 09:06
Not sure why I get duplicate rows in each groups. For me C is present in two groups A1 and A3. — MuSu18, Mar 23 '21 at 09:16
there are only three rows in block 1 so some will have to be repeated if you want 5 groups? — piterbarg, Mar 23 '21 at 09:23
It is not possible to allocate three items to five groups without repeats. If you had enough rows in each block yes you can -- just replace `replace = True` with `replace = False` in `sample(...)` — piterbarg, Mar 23 '21 at 09:37
That works! Whle sorting the column if I have more than 10 groups the column A11 gets sorted after A1 and A12 after A11. I.e, sorting looks like the following A1, A11, A12, A2, A3,... How can I change that? Thanks again! — MuSu18, Mar 23 '21 at 13:41
something like this: https://stackoverflow.com/questions/23493374/sort-dataframe-index-that-has-a-string-and-number — piterbarg, Mar 23 '21 at 14:17

score 0 · Answer 3 · answered Mar 23 '21 at 08:59

0

it's truly not random, but will group as you want
order of source dataframe will influence order that blocks are assigned to a Group

df = pd.DataFrame({'ColumnA': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'k', 'l', 'm', 'n', 'o'], 
              'ColumnB': [12, 32, 44, 76, 99, 123, 65, 87, 76, 231, 80, 55, 27, 67, 34], 
              'Blocks': [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5]})

# generate a series that is "A{n}" where "n" is occurrence# of "Block"
sg = df.groupby("Blocks").apply(lambda g: pd.Series([f"A{i+1}" 
                                                for i in range(len(g))],name="Group")).explode().reset_index(drop=True)

df.join(sg)

	ColumnA	ColumnB	Blocks	Group
0	A	12	1	A1
1	B	32	1	A2
2	C	44	1	A3
3	D	76	2	A1
4	E	99	2	A2
5	F	123	2	A3
6	G	65	2	A4
7	H	87	3	A1
8	I	76	3	A2
9	J	231	3	A3
10	k	80	4	A1
11	l	55	4	A2
12	m	27	5	A1
13	n	67	5	A2
14	o	34	5	A3

answered Mar 23 '21 at 08:59

Rob Raymond

29,118
3
14
30

Thank you very much! But, I get the following error: 'TypeError: explode() missing 1 required positional argument: 'column'' – MuSu18 Mar 23 '21 at 09:22
what version of pandas are you using> `pd.__version__` – Rob Raymond Mar 23 '21 at 09:50
version '1.1.2' – MuSu18 Mar 23 '21 at 09:52
mostly unto date, I'm using 1.2.1. https://pandas.pydata.org/docs/reference/api/pandas.Series.explode.html was changed in 1.1.0, haven't checked regression bugs – Rob Raymond Mar 23 '21 at 09:55

ColumnA	ColumnB	Blocks
A	12	1
B	32	1
C	44	1
D	76	2
E	99	2
F	123	2
G	65	2
H	87	3
I	76	3
J	231	3
k	80	4
l	55	4
m	27	5
n	67	5
o	34	5

ColumnA	ColumnB	Blocks	Groups
B	32	1	A1
E	99	2	A1
I	76	3	A1
l	55	4	A1
m	27	5	A1
A	12	1	A2
F	123	2	A2
k	80	3	A2
m	27	4	A2
n	67	5	A2
C	44	1	A3
H	87	2	A3
J	231	3	A3
n	67	4	A3
o	34	5	A4
D	76	1	A4
G	65	2	A4

	ColumnA	ColumnB	Blocks	Group
0	A	12	1	A1
1	B	32	1	A2
2	C	44	1	A3
3	D	76	2	A1
4	E	99	2	A2
5	F	123	2	A3
6	G	65	2	A4
7	H	87	3	A1
8	I	76	3	A2
9	J	231	3	A3
10	k	80	4	A1
11	l	55	4	A2
12	m	27	5	A1
13	n	67	5	A2
14	o	34	5	A3

ColumnA	ColumnB	Blocks
A	12	1
B	32	1
C	44	1
D	76	2
E	99	2
F	123	2
G	65	2
H	87	3
I	76	3
J	231	3
k	80	4
l	55	4
m	27	5
n	67	5
o	34	5

ColumnA	ColumnB	Blocks	Groups
B	32	1	A1
E	99	2	A1
I	76	3	A1
l	55	4	A1
m	27	5	A1
A	12	1	A2
F	123	2	A2
k	80	3	A2
m	27	4	A2
n	67	5	A2
C	44	1	A3
H	87	2	A3
J	231	3	A3
n	67	4	A3
o	34	5	A4
D	76	1	A4
G	65	2	A4

	ColumnA	ColumnB	Blocks	Group
0	A	12	1	A1
1	B	32	1	A2
2	C	44	1	A3
3	D	76	2	A1
4	E	99	2	A2
5	F	123	2	A3
6	G	65	2	A4
7	H	87	3	A1
8	I	76	3	A2
9	J	231	3	A3
10	k	80	4	A1
11	l	55	4	A2
12	m	27	5	A1
13	n	67	5	A2
14	o	34	5	A3

Block randomization using python?

3 Answers3

ColumnA	ColumnB	Blocks
A	12	1
B	32	1
C	44	1
D	76	2
E	99	2
F	123	2
G	65	2
H	87	3
I	76	3
J	231	3
k	80	4
l	55	4
m	27	5
n	67	5
o	34	5

ColumnA	ColumnB	Blocks	Groups
B	32	1	A1
E	99	2	A1
I	76	3	A1
l	55	4	A1
m	27	5	A1
A	12	1	A2
F	123	2	A2
k	80	3	A2
m	27	4	A2
n	67	5	A2
C	44	1	A3
H	87	2	A3
J	231	3	A3
n	67	4	A3
o	34	5	A4
D	76	1	A4
G	65	2	A4

	ColumnA	ColumnB	Blocks	Group
0	A	12	1	A1
1	B	32	1	A2
2	C	44	1	A3
3	D	76	2	A1
4	E	99	2	A2
5	F	123	2	A3
6	G	65	2	A4
7	H	87	3	A1
8	I	76	3	A2
9	J	231	3	A3
10	k	80	4	A1
11	l	55	4	A2
12	m	27	5	A1
13	n	67	5	A2
14	o	34	5	A3