R: for loop per individual obtaining the start- and enddate

Question

I have a question regarding the following code:

Subjects <- unique(Dataset$ID)
for (i in Subjects){
  startdate <- head(Dataset$DATE[i])
  enddate <- tail(Dataset$DATE[i])
  seq_date <- seq(as.Date(startdate), as.Date(enddate), "days") 
}

With this code I want to obtain a startdate, enddate and seq_date for each unique individual. However, I only get one startdate (from all the first individual) and one enddate (from the last individual). Next to the above code I have tried the following code too:

Subjects <- unique(Dataset$ID)
for (i in Subjects){
  startdate[i] <- head(Dataset$DATE[i])
  enddate[i] <- tail(Dataset$DATE[i])
  seq_date[i] <- seq(as.Date(startdate[i]), as.Date(enddate[i]), "days") 
}

But this results in the error: Error in seq.int(0, to0 - from, by) : 'to' must be a finite number

How can I make this for loop work so that I get a startdate, enddate and seq_date for each individual?

Hi SFKR, welcome to Stack Overflow! We’ll need your data, or at least an example, to understand your issue and test solutions. You can share it in copy-pasteable form by running `dput(Dataset)` or `dput(head(Dataset, 20))` in R then pasting the result into your question. Take a look at [How to make a great R reproducible example](https://stackoverflow.com/q/5963269/17303805) for more details. Thanks! — zephryl, Jan 14 '23 at 14:08

score 0 · Answer 1 · answered Jan 14 '23 at 15:46

Subjects <- unique(Dataset$ID)

Will return a vector with the unique ID's. You can't use these to index the data.frame. For example when your ID's are the following the subjects will be:

ID <- c(10,20,10,20)
Subjects <- unique(ID) # equal to c(10, 20)

So when your do ID[Subjects[1]] you will get the 10th record from ID as Subjects[1] equals 10; and not the records belonging to all subjects with ID equal to 10. For that you will need to do

ID[ID == Subjects[i]]

So your example becomes something like

Subjects <- unique(Dataset$ID)
for (i in Subjects) {
  selection <- Dataset$ID == Subjects[i]
  startdate <- head(Dataset$DATE[selection], 1)
  enddate <- tail(Dataset$DATE[selection], 1)
  seq_date <- seq(as.Date(startdate), as.Date(enddate), "days") 
}

I also added ,1 to the head and tail calls to only get the first resp. last elements.

Hi! This method did not solve the problem I was trying to solve! I have added an extra answer as I have found out how to do it! — SFKR, Apr 04 '23 at 11:53

score 0 · Accepted Answer · answered Apr 04 '23 at 12:02

The answer to the question I mentioned above is the following code:

(I converted the date to numerical time prior to the upcoming loop. Remember the startdate for each ID (for example make a new column containing that with df <- df %>% group_by(ID) %>% mutate(new_column = first(date)) if you want to convert it back to a date pattern.)

un <- unique(data$ID) #the subjects in the dataset
for (i in un){
  startdate <- min(data$TIME[data$ID == i]) 
  enddate <- max(data$TIME[data$ID == i]) 
  time <- seq(startdate,enddate, by = 1) #1 would be your preferred time step here 
  merged_frame <- data.frame(ID = i, TIME = time)
  data <- rbind(data,merged_frame)
}
data <- data[!duplicated(data),] #remove rows with the same date/time stamp

R: for loop per individual obtaining the start- and enddate

2 Answers2