3

I have a 30 x 2 data frame (df) with one column containing the names of 30 individuals and the second column containing their ID#. I want to create a function in R that randomly and most evenly splits the 30 individuals into groups and can handle division with and without remainders.

To clarify, this function would:

• Take 2 parameters as arguments: the df and an integer representing the number of groups • Give me back the original df but with an additional column having the group number that each person gets assigned to randomly • If the number of people (rows) cannot be divided by the integer given, the remaining rows should be split as evenly as possible between the groups

For example: • If I want the 30 people split into 1 group, my function should return df with a new column "group_no" that has 1 for every person (each person would be assigned to the same group)

• If I want 4 groups, I want to see 10 people assigned to 2 groups and the remaining 5 people assigned to another 2 groups.

• If I want 8 groups, then the function should give me 6 groups of 4 people and 2 groups of 3 and so on.

I've written some code that kind of does what I need but I'm just manually entering the groups so not just how random or correct it is... I want to instead write all this in a function that can automatically perform these tasks:

#My code so far
#For 1 group of 30 people

people=1:30
groups=1
df$group_no <- print(sample(groups))

#For 4 groups (2 groups of 10 people and 2 groups of 5 people)
groups=c(rep(1,5), rep(2,5), rep(3,10), rep(4,10))
df$group_no <- print(sample(groups))

#For 7 groups (3 groups of 6 people and 4 groups of 3 people)
groups=c(rep(1,6), rep(2,6), rep(3,6), rep(4,3), rep(5,3), rep(6,3), rep(7,3))
df$group_no <- print(sample(groups))

#For 8 groups (6 groups of 4 people and 2 groups of 3 people)
groups=c(rep(1,4), rep(2,4), rep(3,4), rep(4,4), rep(5,4), rep(6,4), rep(7,3), rep(8,3))
df$group_no <- print(sample(groups))


#For 10 groups of 3 people each
groups=c(rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(5,3), rep(6,3), rep(7,3), rep(8,3), rep(9,3), rep(10,3))
df$group_no <- print(sample(groups))


fct_grouping <- function(df, nr_groups) {
 ????? 
}
R. Simian
  • 187
  • 1
  • 10
  • Possible duplicate: https://stackoverflow.com/questions/6104836/splitting-a-continuous-variable-into-equal-sized-groups – MrFlick Sep 06 '19 at 19:05
  • You're absolutely right about the 7-7-8-8, I actually just realised my error with that and was in the process of correcting it. And in fact for the 7 groups example, I should have 5 groups of 4 and 2 groups of 5. But for the 6-6-9-9, I guess I wouldn't want that because I'm trying to assign the people to groups as most evenly as possible.. So, I'm trying to form the groups such that they contain almost equal number of people in them. Hope that makes sense. – R. Simian Sep 06 '19 at 19:53

3 Answers3

2

This function makes the group sizes as close to even as possible and randomizes group assignment.


grouper <- function(df, n) {

  # create a random number for each row
  random <- sample(1:nrow(df), replace = FALSE, nrow(df))

  # divide the random number by the group size
  df$group_number <- ceiling(random / (nrow(df) / n))

  return(df)  
}
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Lief Esbenshade
  • 793
  • 4
  • 13
1

The following code should do just what you asked and returns a vector with the groupings.

fct_grouping <- function(df, nr_groups) {
    base_number <- floor(nrow(df) / nr_groups)
    rest <- nrow(df) - base_number * nr_groups
    groupings <- sort(c(rep(seq(nr_groups), base_number), if (rest==0) numeric() else seq(rest)))
    return(groupings)
}
apeqqut
  • 11
  • 1
1

I'm sure that what you are looking for should be mathematically possible to program in R, but it's difficult to model for the case when the remainder of the number of groups with the number of people is not equal to zero because there are more than 1 option to assign cases (think defining for number of groups of 10 and greater). Also, the examples you make don't meet the condition you require (size of groups most similarly possible). This is the closest thing I can think of:

df <- data.frame(people = c(1:30))

fct_grouping <- function(df, nr_groups) {

if (nrow(df) %% nr_groups == 0) {
print(cbind(df, sample(nr_groups)))

} else {
print("n is not a multiple of number of people")
}}

df2 <- fct_grouping(df, 5)

#         people sample(nr_groups)
# 1       1                 1
# 2       2                 3
# 3       3                 2
# 4       4                 5
# 5       5                 4
# 6       6                 1
# 7       7                 3
# 8       8                 2
# 9       9                 5
# 10     10                 4
# 11     11                 1
# 12     12                 3
# 13     13                 2
# 14     14                 5
# 15     15                 4
# 16     16                 1
# 17     17                 3
# 18     18                 2
# 19     19                 5
# 20     20                 4
# 21     21                 1
# 22     22                 3
# 23     23                 2
# 24     24                 5
# 25     25                 4
# 26     26                 1
# 27     27                 3
# 28     28                 2
# 29     29                 5
# 30     30                 4
David Jorquera
  • 2,046
  • 12
  • 35
  • Thank you David Jorquera. The last example code below by Lief Esbenshade did the trick perfectly for what I was trying to achieve! I think I was overcomplicating myself by thinking too much about the mathematical intricacies – R. Simian Sep 06 '19 at 22:10
  • Great answer indeed! Please remember to accept as correct answer. – David Jorquera Sep 07 '19 at 17:40