Following up from a question I asked yesterday here, I am trying to design a loop that will subset events from data df1
based on unique combinations of matching date, time, and ID in a second dataset df2
. The output for each iteration will be multiple rows long, and each iteration will have either differing numbers of rows, or might be empty. In the end I need to combine all the iteration outputs into 1 data frame that show the dates, times, and ID numbers for each event on each date. Allocating an empty matrix and running a regular FOR loop or nested loop isn't getting me anywhere. I don't know if I need to start with a different type of structure, or if my dimensions are wrong. Maybe there is an easier way all together.
Here is a sample of the data structure (although the original data is much longer).
dput(df1)
structure(list(Date = c("12-31-2008", "12-31-2008", "12-31-2008",
"12-31-2008", "12-31-2008", "12-31-2008", "01-01-2009", "01-01-2009",
"01-01-2009", "01-01-2009", "01-10-2009", "01-10-2009", "01-10-2009",
"01-10-2009", "01-10-2009", "01-10-2009", "01-10-2009", "01-10-2009",
"01-10-2009", "01-10-2009", "01-10-2009", "01-10-2009", "01-10-2009",
"01-10-2009", "01-10-2009", "01-10-2009", "01-10-2009", "01-10-2009",
"01-10-2009", "01-11-2009", "01-11-2009", "01-17-2009", "01-17-2009",
"01-18-2009", "01-18-2009", "01-18-2009", "01-18-2009", "01-18-2009",
"01-18-2009", "01-18-2009", "01-18-2009", "01-18-2009", "01-18-2009",
"01-18-2009", "01-18-2009", "01-19-2009", "01-19-2009", "01-19-2009",
"01-19-2009", "01-19-2009"), IDNum = c("534198", "534198", "534198",
"534198", "534198", "534198", "534198", "534198", "534198", "534198",
"534198", "534198", "534198", "534198", "534198", "534198", "534198",
"534198", "534198", "534198", "534198", "534198", "534198", "534198",
"534198", "534198", "534198", "534198", "534198", "534198", "534198",
"534198", "534198", "534198", "534198", "534198", "534198", "534198",
"534198", "534198", "534198", "534198", "534198", "534198", "534198",
"534198", "534198", "534198", "534198", "534198"), Time = c("19:01",
"19:53", "20:55", "22:03", "23:04", "23:55", "00:45", "01:48",
"02:50", "03:50", "02:35", "03:42", "04:49", "05:53", "06:55",
"07:55", "08:43", "10:23", "10:31", "11:41", "15:27", "16:33",
"17:41", "18:46", "19:46", "20:48", "21:48", "22:48", "23:48",
"01:49", "02:49", "21:49", "22:49", "12:04", "13:04", "15:05",
"16:05", "17:05", "18:07", "18:49", "19:49", "20:49", "21:49",
"22:50", "23:50", "00:50", "01:50", "03:02", "04:22", "05:25"
)), .Names = c("Date", "IDNum", "Time"), row.names = 8643:8692, class = "data.frame")
dput(df2)
structure(list(Date = c("01-04-2009", "01-05-2009", "01-05-2009",
"01-06-2009", "01-06-2009", "01-07-2009", "01-07-2009", "01-08-2009",
"01-08-2009", "01-09-2009", "01-09-2009", "01-10-2009", "01-11-2009",
"01-12-2009", "01-12-2009", "01-13-2009", "01-14-2009", "01-14-2009",
"01-21-2009", "01-21-2009", "01-22-2009", "01-22-2009", "01-23-2009",
"01-23-2009", "01-24-2009", "01-24-2009", "01-25-2009", "01-25-2009",
"01-26-2009", "01-26-2009", "01-27-2009", "01-28-2009", "01-28-2009",
"01-28-2009", "01-28-2009", "01-29-2009", "01-29-2009", "01-29-2009",
"01-29-2009", "02-05-2009", "02-05-2009", "02-05-2009", "02-06-2009",
"02-06-2009", "02-06-2009", "02-07-2009", "02-07-2009", "02-07-2009",
"02-08-2009", "02-08-2009"), IDNum = c("599091", "599091", "599091",
"599091", "599091", "599091", "599091", "599091", "599091", "599091",
"599091", "599091", "599091", "599091", "599091", "599091", "599091",
"599091", "534198", "534198", "534198", "534198", "534198", "534198",
"534198", "534198", "534198", "534198", "534198", "534198", "534198",
"697345", "697345", "534198", "534198", "697345", "697345", "697345",
"534198", "697345", "697345", "697345", "697345", "697345", "697345",
"697345", "697345", "697345", "697345", "697345"), Trip = c("GL0229",
"GL0229", "GL0229", "GL0229", "GL0229", "GL0229", "GL0229", "GL0229",
"GL0229", "GL0229", "GL0229", "GL0229", "GL0229", "GL0229", "GL0229",
"GL0229", "GL0229", "GL0229", "GL0230", "GL0230", "GL0230", "GL0230",
"GL0230", "GL0230", "GL0230", "GL0230", "GL0230", "GL0230", "GL0230",
"GL0230", "GL0230", "GL0233", "GL0233", "GL0230", "GL0230", "GL0233",
"GL0233", "GL0233", "GL0230", "GL0234", "GL0234", "GL0234", "GL0234",
"GL0234", "GL0234", "GL0234", "GL0234", "GL0234", "GL0234", "GL0234"
), Replicate = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 13L, 14L, 15L, 16L, 17L, 18L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 1L, 2L, 14L, 15L, 3L, 4L, 5L, 16L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), Start = c("12:00",
"08:35", "15:33", "08:30", "15:51", "10:02", "23:04", "11:17",
"21:31", "11:16", "20:07", "11:28", "07:37", "08:40", "16:32",
"09:14", "08:04", "15:15", "07:16", "16:17", "07:10", "16:40",
"07:00", "16:25", "07:17", "16:50", "07:20", "16:18", "07:20",
"15:40", "07:10", "09:34", "11:07", "07:55", "16:38", "07:01",
"08:26", "14:47", "07:18", "07:47", "09:17", "14:58", "07:48",
"08:59", "14:53", "07:30", "09:12", "13:47", "08:56", "09:53"
), End = c("17:21", "15:08", "22:44", "15:12", "09:06", "19:16",
"10:28", "20:12", "10:14", "18:48", "10:53", "20:23", "14:07",
"15:02", "22:27", "18:03", "15:07", "21:19", "16:04", "22:04",
"16:31", "23:01", "16:15", "22:07", "16:33", "22:37", "16:05",
"22:17", "15:22", "22:31", "16:05", "16:41", "19:01", "16:20",
"21:56", "14:31", "19:46", "00:30", "15:10", "14:21", "19:27",
"23:45", "14:31", "19:20", "23:05", "14:51", "20:15", "00:17",
"14:31", "18:07")), .Names = c("Date", "IDNum", "Trip", "Replicate",
"Start", "End"), row.names = 506:555, class = "data.frame")
First I found the dates that match between the 2 data sets and created a new variable records
to display information from df2
based on the matching dates. In this example, I'm just using the 2nd matching date:
match_dates <- as.character(intersect(df1$Date, df2$Date))
records <- df2[which(df2$Date == match_dates[2]),]
print(records)
Date IDNum Trip Replicate Start End
518 01-11-2009 599091 GL0229 13 07:37 14:07
In the original much larger data set, records
ends up being something more like this:
records <- df2[which(df2$Date == match_dates[25]),]
print(records)
# Date IDNum Trip Replicate Start End
# 659 04-02-2009 507646 GL0247 10 09:43 05:19
# 660 04-02-2009 680845 GL0249 4 05:37 11:29
# 661 04-02-2009 680845 GL0249 5 11:59 16:47
The event(s) of interest for each iteration of records
is then defined as df1
times between Start
and End
like this (I did it this way to preserve the unique combinations of date-time-ID-replicate):
event1 <- subset(df1, Date==records[1,"Date"] & IDNum==records[1,"IDNum"] & Time >= records[1,"Start"] & Time <= records[1,"End"])
event2 <- subset(df1, Date==records[2,"Date"] & IDNum==records[2,"IDNum"] & Time >= records[2,"Start"] & Time <= records[2,"End"])
event3 <- subset(df1, Date==records[3,"Date"] & IDNum==records[3,"IDNum"] & Time >= records[3,"Start"] & Time <= records[3,"End"])
The results for each of these events looks like this:
print(event1) #This result is empty
[1] NewRecNum Date IDNum Time Speed
<0 rows> (or 0-length row.names)
print(event2)
Date IDNum Time
80620 04-02-2009 680845 06:35
80621 04-02-2009 680845 07:35
80622 04-02-2009 680845 08:35
80623 04-02-2009 680845 09:35
80624 04-02-2009 680845 10:35
print(event3)
Date IDNum Time
80626 04-02-2009 680845 12:35
80627 04-02-2009 680845 13:35
80628 04-02-2009 680845 14:35
80629 04-02-2009 680845 15:35
80630 04-02-2009 680845 16:35
I'm aiming for a loop that will take every instance of matching dates from match_dates
(in this case 147), create 147 corresponding records
from df2
, and then use the Date, IDNum, Start, and End times in each records
to subset df1
and output the df1
events. What I have so far (that isn't working):
records <- matrix(ncol=6, nrow=nrow(df1)) # Create an empty matrix to start
event=NULL
for (i in 1:length(match_dates))
{ records[i] <- df2[which(df2$Date == match_dates[i]), ]
for (j in 1:nrow(records[i]))
{ event[j] <- subset(df1, Date==records[i,"Date"] & IDNum==records[i,"IDNum"] & Time >= records[i,"Start"] & Time <= records[i,"End"])
}
}
print(event)
Error in 1:nrow(records[i]) : argument of length 0
In addition: Warning message:
In records[i] <- df2[which(df2$Date == match_dates[i]), ] :
number of items to replace is not a multiple of replacement length
> print(event)
NULL
Thanks in advance for any help! I'm banging my head against the wall on this.
EDIT/UPDATE:
I changed records
to
records <- subset(df2, Date %in% df1$Date)
Then wrote a function to subset the matching rows from df1
as
event_func <- function(df,records,i){
event_int <- subset(df, Date==records[i,"Date"] & IDNum==records[i,"IDNum"] & Time >= records[i,"Start"] & Time <= records[i,"End"])
return(event_int)
}
This function works, and outputs what I need. But I'm still having trouble with a loop that will take the 686 rows of records
, match them to df1
, and output a final data frame of all the df1
rows that match. I also tried using lapply
Here's what I have (neither of which is working):
# First option using a loop
final <- data.frame()
event_int <- data.frame()
for (i in 1:nrow(records)) {
event_int[i] <- event_func(df1, records,i)
final <- rbind(event_int, event_int[i])
}
# Second option using lapply
lapply(records, event_func(df1,records,1:nrow(records)))
Thanks again for any help!