I am trying to implement a permutation test on a large Pandas dataframe. The dataframe looks like the following:
group some_value label
0 1 8 1
1 1 7 0
2 1 6 2
3 1 5 2
4 2 1 0
5 2 2 0
6 2 3 1
7 2 4 2
8 3 2 1
9 3 4 1
10 3 2 1
11 3 4 2
I want to group by column group
, and shuffle the label
column and write back to the data frame, preferably in place. The some_value
column should remain intact. The result should look something like the following:
group some_value label
0 1 8 1
1 1 7 2
2 1 6 2
3 1 5 0
4 2 1 1
5 2 2 0
6 2 3 0
7 2 4 2
8 3 2 1
9 3 4 2
10 3 2 1
11 3 4 1
I used np.random.permutation
but found it was very slow.
df["label"] = df.groupby("group")["label"].transform(np.random.permutation
It seems that df.sample
is much faster. How can I solve this problem using df.sample()
instead of np.random.permutation
, and inplace?