3

Thanks for the feedback, below is a reproducible example with my desire output:

# Example Data where I would like my output
N=24 
school.assignment = matrix(NA, ncol = 3, nrow = N)
school.assignment = as.data.frame(school.assignment)
colnames(school.assignment)  <- c("ID","Group","Assignment")

# Number of groups and assigments per group
groups = 6 
Assignment = 4 
school.assignment$Group<-rep(1:groups,Assignment)
school.assignment$Group<- sort(school.assignment$Group)
school.assignment$Assignment<-rep(1:Assignment)


# IDs with number of repeats (i.e repeated students)
Data = matrix(0, ncol = 2, nrow = N/2) # ~half with repeated samples
Data = as.data.frame(Data)
colnames(Data) <- c("ID","Repeats")
Data$ID <-1:(N/2)
length(unique(Data$ID)) # unique IDS
ID=rep(seq(1:8),3)

# Genearte random repeats for each ID
Data$Repeats<-{set.seed(55)
               sapply(1:(N/2), 
                      function(x) sample(1:5,1))
}
Data=Data[-1,] #take out first row to match N=24
sum(Data$Repeats) #24 total IDs for all assigments 


# List of IDs at random to use
IDs <- vector("list",dim(Data)[1]) #
for (i in 1:dim(Data)[1])
{
  IDs[[i]]<-rep(Data$ID[i], times=Data$Repeats[i])
}
head(IDs)


# Object with number of repeated IDs
sample.per.ID <- vector("list",length(IDs)[1])
for (i in 1:length(IDs))
{
  sample.per.ID[[i]]<-sum(length((IDs)[[i]]))
}
sum=sum(as.data.frame(sample.per.ID)); sum # 24 total IDs (including repeats)

## Unlist Vector with ransom sequence of samples
SRS.ID.order = unlist(IDs) #order of IDs with repeats
for (i in 1:length(SRS.ID.order ))
{
  school.assignment$ID[i]<-SRS.ID.order [i]
}

My last loop is where I attempt to assign IDs to my matrix of school.assignment$ID. However, as you can see some IDs cross different groups and I want to condition ID assignment from the SRS.ID.order to stay within the same group (i.e. constant school.assignment$Group, below you can see that this is not the case, for example ID 4 is in group 1 and 2)

> head(school.assignment)
  ID Group Assignment
1  2     1          1
2  2     1          2
3  3     1          3
4  4     1          4
5  4     2          1
6  4     2          2

I would like the output of the loop to don't assign any ID (i.e. NA) to that group if the next school.assignment$ID length is longer than the space available in that group.

  ID Group Assignment
1  2     1          1
2  2     1          2
3  3     1          3
4 NA     1          4
5  4     2          1
6  4     2          2

I was thinking that I need some type of indicator for the J group like this code below:

########################################
for (i in 1:length(school.assignment$ID))
{
  for (j in 1:length(unique(school.assignment$Group)))
  {
    school.assignment$ID[i]<-ifelse(sum(is.na(school.assignment$ID[i,j]))>=sample.per.ID[i],SRS.ID.order[i],NA)
  }
}
Error in school.assignment$ID[i, j] : incorrect number of dimensions

Any help is very much appreciated!

Thanks


OLD POST

I'm currently trying to do a loop in R with a a condition. My data structure is below:

> head(school.assignment)
   ID   Group    Assignment
1  NA    1            1
2  NA    1            2
3  NA    1            3
4  NA    1            4
5  NA    2            1
6  NA    2            2

I would like to assign an ID of the same length as school.assignment to the ID variable shown below:

head(IDs)
[1] 519 519 519 343 251 251...

Not all IDs repeat the same amount of times some 1,2 or even 3 times as shown above. I have an object with the number of repeats per ID, for example:

> head(repeats)
[1] 3 1 2...

Indicating that ID=519 repeats 3 times, ID=343 only once ad ID=251 2 times etc...

There is one condition that I would like to meet:

1) I would like every single ID to be in the same group whenever possible (i.e. if there is only one spot (NA) left for ID in the matrix object "school.assignment" for group 1 then assign the ID to the next group where they will be enough spaces (i.e where NA for school.assignment$ID is >= to repeats for that ID)

My idea was to do a loop but the code below is not working:

########################################
  for (i in 1:length(school.assignment$ID))
    {
    for (j in 1:length(unique(school.assignment$Group)))
    {
  school.assignment$ID[i]<-ifelse(sum(is.na(school.assignment$ID[i,j]))>=repeats[i],ID[i],NA)
  }
}

Is there a way to do this loop while respecting my condition to assign IDs to only one group?

Thank you!

  • It is not clear. Please show a small reproducible example and expected output. – akrun Dec 12 '15 at 16:39
  • Welcome to StackOverflow. A [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) would be helpful – polka Dec 12 '15 at 16:40

1 Answers1

1

Consider using merge() to assign random group IDs to data frame. No need for nested for loops. Below creates a unique group data frame, assigns random numbers there, and then merges with school.assignment:

# CREATE UNIQUE GROUP DATA FRAME 
Group <- unique(school.assignment$Group)
grp.ids <- as.data.frame(Group)

# CREATE RANDOM ID FIELD (THREE DIGITS BETWEEN 100 AND 999)
grp.ids$RandomID <- sample(100:999, size = nrow(grp.ids), replace = TRUE)

# MERGE DATA FRAMES
school.assignment <- merge(school.assignment, grp.ids, by="Group", all=TRUE)
# ASSIGN ID COLUMN
school.assignment$ID <- school.assignment$RandomID
# RESTRUCTURE FINAL DATA FRAME
school.assignment <- school.assignment[c("ID", "Group", "Assignment")]

OUTPUT

 ID     Group   Assignment
977         1            1
977         1            2
977         1            3
977         1            4
368         2            1
368         2            2
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thank you for your answer, this is useful. However, this assumes even number of IDs per group. My problem is having different number of IDs per "slots" in the group. In the example above each group has 4 potential spots for IDs. However, some IDs repeat 1,2 or 3 times. I want to keep the repeated IDs in a single group. I know this will lead to some groups not being balance but I'm OK with this (i.e. some groups might end up with 1 or 2 NAs). Should I still pursue the loop? – StatsStacker Dec 12 '15 at 23:20