R function that evenly splits observations into groups

Question

I have a 30 x 2 data frame (df) with one column containing the names of 30 individuals and the second column containing their ID#. I want to create a function in R that randomly and most evenly splits the 30 individuals into groups and can handle division with and without remainders.

To clarify, this function would:

• Take 2 parameters as arguments: the df and an integer representing the number of groups • Give me back the original df but with an additional column having the group number that each person gets assigned to randomly • If the number of people (rows) cannot be divided by the integer given, the remaining rows should be split as evenly as possible between the groups

For example: • If I want the 30 people split into 1 group, my function should return df with a new column "group_no" that has 1 for every person (each person would be assigned to the same group)

• If I want 4 groups, I want to see 10 people assigned to 2 groups and the remaining 5 people assigned to another 2 groups.

• If I want 8 groups, then the function should give me 6 groups of 4 people and 2 groups of 3 and so on.

I've written some code that kind of does what I need but I'm just manually entering the groups so not just how random or correct it is... I want to instead write all this in a function that can automatically perform these tasks:

#My code so far
#For 1 group of 30 people

people=1:30
groups=1
df$group_no <- print(sample(groups))

#For 4 groups (2 groups of 10 people and 2 groups of 5 people)
groups=c(rep(1,5), rep(2,5), rep(3,10), rep(4,10))
df$group_no <- print(sample(groups))

#For 7 groups (3 groups of 6 people and 4 groups of 3 people)
groups=c(rep(1,6), rep(2,6), rep(3,6), rep(4,3), rep(5,3), rep(6,3), rep(7,3))
df$group_no <- print(sample(groups))

#For 8 groups (6 groups of 4 people and 2 groups of 3 people)
groups=c(rep(1,4), rep(2,4), rep(3,4), rep(4,4), rep(5,4), rep(6,4), rep(7,3), rep(8,3))
df$group_no <- print(sample(groups))


#For 10 groups of 3 people each
groups=c(rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(5,3), rep(6,3), rep(7,3), rep(8,3), rep(9,3), rep(10,3))
df$group_no <- print(sample(groups))


fct_grouping <- function(df, nr_groups) {
 ????? 
}

Possible duplicate: https://stackoverflow.com/questions/6104836/splitting-a-continuous-variable-into-equal-sized-groups — MrFlick, Sep 06 '19 at 19:05
You're absolutely right about the 7-7-8-8, I actually just realised my error with that and was in the process of correcting it. And in fact for the 7 groups example, I should have 5 groups of 4 and 2 groups of 5. But for the 6-6-9-9, I guess I wouldn't want that because I'm trying to assign the people to groups as most evenly as possible.. So, I'm trying to form the groups such that they contain almost equal number of people in them. Hope that makes sense. — R. Simian, Sep 06 '19 at 19:53

score 2 · Accepted Answer · edited Sep 07 '19 at 02:51

2

This function makes the group sizes as close to even as possible and randomizes group assignment.


grouper <- function(df, n) {

  # create a random number for each row
  random <- sample(1:nrow(df), replace = FALSE, nrow(df))

  # divide the random number by the group size
  df$group_number <- ceiling(random / (nrow(df) / n))

  return(df)  
}

edited Sep 07 '19 at 02:51

NelsonGon

13,015
7
27
57

answered Sep 06 '19 at 20:34

Lief Esbenshade

793
4
13

Lief Esbenshade, thank you so much for your help, this actually works PERFECTLY for what I was trying to do! :) – R. Simian Sep 06 '19 at 22:07
Thanks, feel free to accept this answer if its the best solution to your question. – Lief Esbenshade Sep 06 '19 at 22:32

score 1 · Answer 2 · answered Sep 06 '19 at 20:22

1

The following code should do just what you asked and returns a vector with the groupings.

fct_grouping <- function(df, nr_groups) {
    base_number <- floor(nrow(df) / nr_groups)
    rest <- nrow(df) - base_number * nr_groups
    groupings <- sort(c(rep(seq(nr_groups), base_number), if (rest==0) numeric() else seq(rest)))
    return(groupings)
}

answered Sep 06 '19 at 20:22

apeqqut

11
1

Thank you apeqqut! – R. Simian Sep 06 '19 at 22:08

score 1 · Answer 3 · answered Sep 06 '19 at 20:49

I'm sure that what you are looking for should be mathematically possible to program in R, but it's difficult to model for the case when the remainder of the number of groups with the number of people is not equal to zero because there are more than 1 option to assign cases (think defining for number of groups of 10 and greater). Also, the examples you make don't meet the condition you require (size of groups most similarly possible). This is the closest thing I can think of:

df <- data.frame(people = c(1:30))

fct_grouping <- function(df, nr_groups) {

if (nrow(df) %% nr_groups == 0) {
print(cbind(df, sample(nr_groups)))

} else {
print("n is not a multiple of number of people")
}}

df2 <- fct_grouping(df, 5)

#         people sample(nr_groups)
# 1       1                 1
# 2       2                 3
# 3       3                 2
# 4       4                 5
# 5       5                 4
# 6       6                 1
# 7       7                 3
# 8       8                 2
# 9       9                 5
# 10     10                 4
# 11     11                 1
# 12     12                 3
# 13     13                 2
# 14     14                 5
# 15     15                 4
# 16     16                 1
# 17     17                 3
# 18     18                 2
# 19     19                 5
# 20     20                 4
# 21     21                 1
# 22     22                 3
# 23     23                 2
# 24     24                 5
# 25     25                 4
# 26     26                 1
# 27     27                 3
# 28     28                 2
# 29     29                 5
# 30     30                 4

Thank you David Jorquera. The last example code below by Lief Esbenshade did the trick perfectly for what I was trying to achieve! I think I was overcomplicating myself by thinking too much about the mathematical intricacies — R. Simian, Sep 06 '19 at 22:10
Great answer indeed! Please remember to accept as correct answer. — David Jorquera, Sep 07 '19 at 17:40

R function that evenly splits observations into groups

3 Answers3