I am working with GPS locations that occurred in a given time period "dateperiod". I want to use the value in one row (dateperiod), look in the column for that dateperiod, and extract the value (distance to disturbance) for whatever row I am working in. But I also am doing this within a loop that goes through multiple disturbance dataframes. Dummy data set:
Example Basic data (data_basic_DT):
structure(list(EndId = 1:9, dateperiod = c(141101L, 141101L,
141101L, 141101L, 141101L, 141101L, 141101L, 141101L, 141101L
)), .Names = c("EndId", "dateperiod"), row.names = c(NA, -9L), class = "data.frame")
Example disturbance data 1 (low_roads):
structure(list(EndId = 1:9, dateperiod = c(141101L, 141101L,
141101L, 141101L, 141101L, 141101L, 141101L, 141101L, 141101L
), `151101` = c(710.211, 684.471, 676.831, 762.955, 704.06, 674.685,
682.495, 686.586, 696.348), `150501` = c(710.211, 684.471, 676.831,
762.955, 704.06, 674.685, 682.495, 686.586, 696.348), `141101` = c(710.211,
684.471, 676.831, 762.955, 704.06, 674.685, 682.495, 686.586,
696.348), `140501` = c(710.211, 684.471, 676.831, 762.955, 704.06,
674.685, 682.495, 686.586, 696.348), `131101` = c(710.211, 684.471,
676.831, 762.955, 704.06, 674.685, 682.495, 686.586, 696.348),
`130501` = c(710.211, 684.471, 676.831, 762.955, 704.06,
674.685, 682.495, 686.586, 696.348), `121101` = c(710.211,
684.471, 676.831, 762.955, 704.06, 674.685, 682.495, 686.586,
696.348)), .Names = c("EndId", "dateperiod", "151101", "150501",
"141101", "140501", "131101", "130501", "121101"), row.names = c(NA,
-9L), class = "data.frame")
Ex disturbance data 2 (high_roads):
structure(list(EndId = 1:9, dateperiod = c(141101L, 141101L,
141101L, 141101L, 141101L, 141101L, 141101L, 141101L, 141101L
), `151101` = c(806.415, 802.56, 502.35, 1234.2, 704.06, 685.23,
682.495, 1002.3, 696.348), `150501` = c(710.211, 684.471, 676.831,
762.955, 704.06, 802.56, 502.35, 1234.2, 696.348), `141101` = c(710.211,
130.25, 453.25, 762.955, 704.06, 674.685, 682.495, 686.586, 696.348
), `140501` = c(710.211, 684.471, 802.56, 502.35, 1234.2, 674.685,
682.495, 686.586, 696.348), `131101` = c(710.211, 684.471, 676.831,
762.955, 704.06, 674.685, 502.35, 1234.2, 704.06), `130501` = c(710.211,
684.471, 676.831, 762.955, 704.06, 674.685, 682.495, 686.586,
696.348), `121101` = c(502.35, 1234.2, 704.06, 762.955, 704.06,
674.685, 682.495, 686.586, 696.348)), .Names = c("EndId", "dateperiod",
"151101", "150501", "141101", "140501", "131101", "130501", "121101"
), row.names = c(NA, -9L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000000006640788>)
So, for each EndId, I want it to look at the dateperiod, see that it is 141101 in this example, look at column "141101", extract the value, and put it in a new column. Within a loop that goes through low_roads and high_roads.
Thanks to some help (below), I have it working much faster than before, with this:
disturbancelist <- list(low_roads=low_roads, high_roads=high_roads) #Lists all the disturbance dataframes
for (d in disturbancelist){
##Create a column named by the current disturbance class
Class<-d$Class[2] ##calls the disturbance type
##Merge basic data and each disturbance dateframe to get the right distance values
mergeex<-merge(data_basic_DT, d, by.x = "EndId", by.y = "EndId", all.y == FALSE)
mergeexdf<-as.data.frame(mergeex)
col.names<-names(mergeexdf)
mergeexdf$distance <- mergeexdf[cbind(1:nrow(mergeexdf), fmatch(mergeexdf$dateperiod, col.names))]
names(data_basic_DT)[names(data_basic_DT)=="distance"] <- Class ##Change name of column to current disturbance class
print(Class)
}
Now, I would like to change this code to work in data.tables to make it run faster. It works outside of the loop as data.tables, but not within. Any help appreciated!