0

I have a question regarding the following code:

Subjects <- unique(Dataset$ID)
for (i in Subjects){
  startdate <- head(Dataset$DATE[i])
  enddate <- tail(Dataset$DATE[i])
  seq_date <- seq(as.Date(startdate), as.Date(enddate), "days") 
}

With this code I want to obtain a startdate, enddate and seq_date for each unique individual. However, I only get one startdate (from all the first individual) and one enddate (from the last individual). Next to the above code I have tried the following code too:

Subjects <- unique(Dataset$ID)
for (i in Subjects){
  startdate[i] <- head(Dataset$DATE[i])
  enddate[i] <- tail(Dataset$DATE[i])
  seq_date[i] <- seq(as.Date(startdate[i]), as.Date(enddate[i]), "days") 
}

But this results in the error: Error in seq.int(0, to0 - from, by) : 'to' must be a finite number

How can I make this for loop work so that I get a startdate, enddate and seq_date for each individual?

SFKR
  • 13
  • 3
  • Hi SFKR, welcome to Stack Overflow! We’ll need your data, or at least an example, to understand your issue and test solutions. You can share it in copy-pasteable form by running `dput(Dataset)` or `dput(head(Dataset, 20))` in R then pasting the result into your question. Take a look at [How to make a great R reproducible example](https://stackoverflow.com/q/5963269/17303805) for more details. Thanks! – zephryl Jan 14 '23 at 14:08

2 Answers2

0
Subjects <- unique(Dataset$ID)

Will return a vector with the unique ID's. You can't use these to index the data.frame. For example when your ID's are the following the subjects will be:

ID <- c(10,20,10,20)
Subjects <- unique(ID) # equal to c(10, 20)

So when your do ID[Subjects[1]] you will get the 10th record from ID as Subjects[1] equals 10; and not the records belonging to all subjects with ID equal to 10. For that you will need to do

ID[ID == Subjects[i]]

So your example becomes something like

Subjects <- unique(Dataset$ID)
for (i in Subjects) {
  selection <- Dataset$ID == Subjects[i]
  startdate <- head(Dataset$DATE[selection], 1)
  enddate <- tail(Dataset$DATE[selection], 1)
  seq_date <- seq(as.Date(startdate), as.Date(enddate), "days") 
}

I also added ,1 to the head and tail calls to only get the first resp. last elements.

Jan van der Laan
  • 8,005
  • 1
  • 20
  • 35
  • Hi! This method did not solve the problem I was trying to solve! I have added an extra answer as I have found out how to do it! – SFKR Apr 04 '23 at 11:53
0

The answer to the question I mentioned above is the following code:

(I converted the date to numerical time prior to the upcoming loop. Remember the startdate for each ID (for example make a new column containing that with df <- df %>% group_by(ID) %>% mutate(new_column = first(date)) if you want to convert it back to a date pattern.)

un <- unique(data$ID) #the subjects in the dataset
for (i in un){
  startdate <- min(data$TIME[data$ID == i]) 
  enddate <- max(data$TIME[data$ID == i]) 
  time <- seq(startdate,enddate, by = 1) #1 would be your preferred time step here 
  merged_frame <- data.frame(ID = i, TIME = time)
  data <- rbind(data,merged_frame)
}
data <- data[!duplicated(data),] #remove rows with the same date/time stamp
SFKR
  • 13
  • 3