Not sure title is clear or not, but I want to shuffle a column in a dataframe, but not for every individual row, which is very simple to do using sample()
, but for pairs of observations from the same sample.
For instance, I have the following dataframe df1:
>df1
sampleID groupID A B C D E F
438 1 1 0 0 0 0 0
438 1 0 0 0 0 1 1
386 1 1 1 1 0 0 0
386 1 0 0 0 1 0 0
438 2 1 0 0 0 1 1
438 2 0 1 1 0 0 0
582 2 0 0 0 0 0 0
582 2 1 0 0 0 1 0
597 1 0 1 0 0 0 1
597 1 0 0 0 0 0 0
I want to randomly shuffle the labels here for groupID for each sample, not observation, so that the result looks like:
>df2
sampleID groupID A B C D E F
438 1 1 0 0 0 0 0
438 1 0 0 0 0 1 1
386 2 1 1 1 0 0 0
386 2 0 0 0 1 0 0
438 1 1 0 0 0 1 1
438 1 0 1 1 0 0 0
582 1 0 0 0 0 0 0
582 1 1 0 0 0 1 0
597 2 0 1 0 0 0 1
597 2 0 0 0 0 0 0
Notice that in column 2 (groupID), sample 386 is now 2 (for both observations).
I have searched around but haven't found anything that works the way I want. What I have now is just shuffling the second column. I tried to use dplyr as follows:
df2 <- df1 %>%
group_by(sampleID) %>%
mutate(groupID = sample(df1$groupID, size=2))
But of course that only takes all the group IDs and randomly selects 2.
Any tips or suggestions would be appreciated!