create random subsets in R without duplicates

Question

my task is to divide a dataset of 32 rows into 8 groups without having duplicated entries. i am trying to do this with a loop and by creating a new dataset after each cycle.

the data:

  year pos country  elo fifa          cont hcountry  hcont
1  2010         FRA 1851 1044        Europe      RSA Africa
2  2010         MEX 1872  895 South America      RSA Africa
3  2010         URU 1819  899 South America      RSA Africa
4  2010         RSA 1569  392        Africa      RSA Africa
5  2010         GRE 1726  964        Europe      RSA Africa
6  2010         KOR 1766  632          Asia      RSA Africa
8  2010         ARG 1899 1076 South America      RSA Africa
9  2010         USA 1749  957 North America      RSA Africa
10 2010         SVN 1648  860        Europe      RSA Africa
11 2010         ALG 1531  821        Africa      RSA Africa

...

my solution so far:

for (i in 1:8){
assign(paste("group", i, sep = ""), droplevels(subset(wc2010[sample(nrow(wc2010), 4),])))
wc2010 <- subset(wc2010, !(country %in% group[i]$country))
}

problem is of course: i don't know how to use the loop-variable.... :-(

help would be deeply appreciated! thanks Bob

Your sample data, if we look at all columns, does not contain any duplicates. Because yeah, there were 32 *different* countries at the WC... So what subset of fields do you consider when calling things duplicates? — flodel, Nov 24 '13 at 13:18
Or maybe I get it now... By avoiding duplicates, you mean each country should go into one and exactly one group. That's called a *partition*. — flodel, Nov 24 '13 at 13:25

flodel · Accepted Answer · 2013-11-24T14:09:01.473

Here is one way to create a random partition:

random.groups <- function(n.items = 32L, n.groups = 8L)
  1L + (sample.int(n.items) %% n.groups)

So then you just have to do:

wc2010$group <- random.groups(nrow(wc2010), n.groups = 8L)

Then you might also be interested in doing

groups <- split(wc2010, wc2010$group)

Edit: this was not asked by the OP, but I realize that soccer draws for big tournaments usually involves hats: before the draw, teams are grouped by regions and/or rankings. Then groups are formed by randomly picking one team from each hat, so that two teams from a same hat cannot end up in the same group.

Here is a modification to my function so it can also take hats as an input:

random.groups <- function(n.items = 32L, n.groups = 8L,
                          hats = rep(1L, n.items)) {

  splitted.items  <- split(seq.int(n.items), hats)

  shuffled <- lapply(splitted.items, sample)

  1L + (order(unlist(shuffled)) %% n.groups)
}

Here is an example, where say, the first 8 teams are in hat #1, the next 8 teams are in hat #2, etc.:

# set.seed(123)
random.groups(32, 8, c(rep(1, 8), rep(2, 8), rep(3, 8), rep(4, 8)))
# [1] 7 8 2 6 5 3 1 4 8 7 5 3 2 4 1 6 3 2 7 6 5 8 1 4 7 6 5 4 3 2 1 8

thats even better! :-) if you are intersted in the data set: it is of course a (rather old) kaggle competition [link](http://www.kaggle.com/c/worldcup2010) — user3027205, Nov 24 '13 at 14:24

create random subsets in R without duplicates

1 Answers1

Linked