Permute/randomize rows within a column independently

Question

I have a dataframe like so:

> df1
  a b c
1 0.5 0.3 0
2 0.2 0 0
3 0 0.6 0
4 0 0 0.4

I would like to permute the rows within each column with replacement 1000 times, however I would like to do this independently for each column (like a slot machine in Las Vegas).

I noticed that the sample function in R doesn't really allow this, for example sampling rowwise gives you.

> df2 <- df1[sample(nrow(df1)),]
> df2
  a b c
3 0 0.6 0
4 0 0 0.4
2 0.2 0 0
1 0.5 0.3 0

But notice how the whole row is taken as a chunk (i.e they are kept beside their columns e.g 0.5 is always beside 0.3)

I don't think doing this both column-wise and row-wise is the correct answer because then it is permuting horizontally and vertically (i.e not like a slot machine in Vegas).

2

May be `lapply(df1, sample)` – akrun May 09 '15 at 17:07

score 3 · Answer 1 · answered May 09 '15 at 17:21

3

Here's one way:

df2 <- df1
n   <- nrow(df1)

set.seed(1)
df2[] <- lapply(df1,function(x) x[sample.int(n)] )
#     a   b   c
# 1 0.2 0.3 0.0
# 2 0.0 0.6 0.0
# 3 0.0 0.0 0.4
# 4 0.5 0.0 0.0

Or just lapply(df1,sample) as @akrun said.

answered May 09 '15 at 17:21

Frank

66,179
8
96
180

Is there a way to get `df2` as a dataframe, and not like a list? – GabrielMontenegro Feb 11 '22 at 14:22

score 0 · Answer 2 · answered May 09 '15 at 17:30

0

The answer options above return a list, which may be fine for your purposes. Here's another option:

set.seed(1)
matrix(sample((unlist(df1))), ncol = 3, dimnames = (list(NULL, letters[1:3])))

       a   b   c
[1,] 0.0 0.2 0.0
[2,] 0.3 0.6 0.5
[3,] 0.0 0.0 0.0
[4,] 0.0 0.4 0.0

answered May 09 '15 at 17:30

Chase

67,710
18
144
161

1

This does not permute vectors: a=(0,.3,0,0) is not a permutation of a=(.5,.2,0,0) – Frank May 09 '15 at 17:52
Also, while `lapply` returns a list, if you assign it to a `data.frame`, like `df[] <- some_list`, it's all good. – Frank May 09 '15 at 17:54
1

@frank - I guess I misread the requirement of needing to preserve the column-ness of the permutations. In that case, your solution is most appropriate. My solution essentially samples all values and formats back into the original dimensions. – Chase May 10 '15 at 23:07

Permute/randomize rows within a column independently

2 Answers2