R: Random values from one column in 5 columns

Question

I have a dataframe (df) containing approximately 100 soccer player numbers (if more players sign-up, the number increases). Each player_number consists of 6 digits (e.g. 178530).

Every player should review 5 other players, so eventually all players are reviewed by 5 others. Therefore I would like to randomly assign 5 different player numbers (from the player_number column) to each player_number. To prevent assigning reviews to themselves and/or players having to review the same player twice (or more), each player_number should only occur once in every column and in every row. The dataframe should look like this:

player_number  review1  review2  review3  review4  review5
178530         207145    655600   443274   604060   804226
245678         947821    214525   332324   174589   868954      
…

Player 178530 needs to review players 207145, 655600 etc.

For review1 column, I have used: set.seed(1) df$review1 <- sample(df$player_number, nrow(df), replace=F)

This works for review1, but applying it to the other review columns leads to duplicate player_number in several rows. Can anyone help me out so each player_number only occurs once in every column and in every row? Thanks in advance.

Edit: in a previous version I simplified the player_number too much (1:100)

So, do you want 100 players each assigned 5 values between 1 and 100 to them; or split a sequence of 1 to 100 in 20 parts? — milan, Aug 07 '18 at 14:25
I've edited the post to make it more clear what I'm looking for: each player_number (approximately 100; the exact number depends on the number of sign-ups) should be assigned 5 random player_numbers from the player_number column. Each player_number should only occur once in every column and in every row — Roalda, Aug 07 '18 at 14:41
Do you in addition want each player to *be reviewed* 5 times? You didn't specify this but it seems like a natural constraint to me. — Michael Lugo, Aug 07 '18 at 15:09
Yes, you're right. Each player needs to be reviewed 5 times (5 different players review a player once). — Roalda, Aug 07 '18 at 15:18
My code does that. If you check `table(table(unlist(c(df))))` it gives that all numbers are 6 times in df. — Lennyy, Aug 07 '18 at 16:32
Thanks for the answers. They are however based on my initial post that simplified the player_number column too much by stating that it consisted of values 1:100 (in the way I treated the problem it didn't matter, but now I see your solutions it clearly does). Each player_number actually consists of 6 digits (see table in post). — Roalda, Aug 07 '18 at 17:12
An option would be to convert all columns to `factor`s. Please see my updated code below. — Lennyy, Aug 07 '18 at 20:30

milan · Accepted Answer · 2018-08-09T14:00:40.847

You could write a function for that. The idea is to take your vector of 100 IDs or player numbers; randomly sample 5 unique starting values for 5 new vectors and bind these to have your result where no IDs are found more than once in every row and column.

For example, if you have numbers 1 to 5 (that order), and want to assign 3 of the numbers to each number of 1 to 5; having no number more than once in a row or column.

This is the function that does that.

play <- function(v, i){
  starts <- sample(2:length(v), i, replace=F)
  v2 <- v
  for(m in 1:i){
    v2 <- cbind(v2, c(v[starts[m]:length(v)], v[0:(starts[m]-1)]) )
  }
  colnames(v2) <- c('id', paste0('R', 1:i))
  return(v2)
}

Try it.

play(1:5, 3)

This is a similar function that takes a dataframe because you are asking for that in the question.

playDF <- function(df, i){
  starts <- sample(1:nrow(df), i+1, replace=F)
  sq2 <- NULL
  for(m in 1:(i+1)){
    sq2 <- cbind(sq2, c(df[starts[m]:nrow(df),], df[0:(starts[m]-1),]) )
  }
  sq2 <- as.data.frame(sq2)
  colnames(sq2) <- c('player_number', paste0('review', 1:(i)))
  return(sq2)
}

I've added example data for your problem. Run the function and apply it to the data.

df <- data.frame(player_number=c(sample(111111:999999, 100, replace=F)))
playDF(df, 5)

Lennyy · Answer 2 · 2018-08-07T21:02:28.727

Might not be the most efficient, but this is a solution using just base R. In here I just sample 1 number at a time, from a vector of 1:100 without the already present numbers in the current row and current column.

For row 100 this would mean numbers are sampled from a vector of length 1, which causes the sample function to behave differently. Therefore, to prevent this unexpected behaviour, I kindly bestowed the sample.vec custom function from Sampling in R from vector of varying length.

df <- data.frame(player_number = c(1:100))
df <- cbind(df, matrix(NA, 100, 5))

sample.vec <- function(x, ...) x[sample(length(x), ...)]

for(i in 1:100){
  for(j in 2:6){
    df[i,j] <- sample.vec(setdiff(c(1:100),c(df[i,], df[,j])), 1)
  }
}

UPDATE after change in question: If you like to use those custom player numbers of 6 digits, an option could be to convert alll columns to factors, using 1:100 as the levels and the actual player numbers as labels. So after the code above, you could do something like this:

set.seed(1); player_number = sort(sample(100000:999999, 100)) # in your data, just create this vector beforehand using the actual player numbers
df[] <- lapply(df, function(x) {factor(x, levels = c(1:100), labels = player_number)})

Proof:

head(df)
  player_number      1      2      3      4      5
1        112050 400373 466123 666197 888560 332198
2        120997 887728 917384 701596 682327 189514
3        153035 332198 315644 745845 469035 800949
4        155607 544171 759047 992698 450960 799685
5        163607 908546 338957 694713 267589 406304
6        175816 469035 120997 459962 875044 447493


table(apply(df, 1, function(x) {length(unique(x))}))
  6 
100 

table(apply(df, 2, function(x) {length(unique(x))}))
100 
  6

score 0 · Answer 3 · answered Aug 07 '18 at 13:57

0

library(tidyverse)
df=data.frame(x=1:100)

  df%>%
  mutate(number = map(x, ~ glue::collapse(sample(x,5,replace=),",")))%>%
  separate(number,into=  glue::glue("review{1:5}"))

answered Aug 07 '18 at 13:57

jyjek

2,627
11
23

might want to add some explanation to this code block, would make this a much better answer. – workabyte Aug 07 '18 at 19:19

R: Random values from one column in 5 columns

3 Answers3