0

I am working with GPS locations that occurred in a given time period "dateperiod". I want to use the value in one row (dateperiod), look in the column for that dateperiod, and extract the value (distance to disturbance) for whatever row I am working in. But I also am doing this within a loop that goes through multiple disturbance dataframes. Dummy data set:

Example Basic data (data_basic_DT):

structure(list(EndId = 1:9, dateperiod = c(141101L, 141101L, 
141101L, 141101L, 141101L, 141101L, 141101L, 141101L, 141101L
)), .Names = c("EndId", "dateperiod"), row.names = c(NA, -9L), class = "data.frame")

Example disturbance data 1 (low_roads):

structure(list(EndId = 1:9, dateperiod = c(141101L, 141101L, 
141101L, 141101L, 141101L, 141101L, 141101L, 141101L, 141101L
), `151101` = c(710.211, 684.471, 676.831, 762.955, 704.06, 674.685, 
682.495, 686.586, 696.348), `150501` = c(710.211, 684.471, 676.831, 
762.955, 704.06, 674.685, 682.495, 686.586, 696.348), `141101` = c(710.211, 
684.471, 676.831, 762.955, 704.06, 674.685, 682.495, 686.586, 
696.348), `140501` = c(710.211, 684.471, 676.831, 762.955, 704.06, 
674.685, 682.495, 686.586, 696.348), `131101` = c(710.211, 684.471, 
676.831, 762.955, 704.06, 674.685, 682.495, 686.586, 696.348), 
    `130501` = c(710.211, 684.471, 676.831, 762.955, 704.06, 
    674.685, 682.495, 686.586, 696.348), `121101` = c(710.211, 
    684.471, 676.831, 762.955, 704.06, 674.685, 682.495, 686.586, 
    696.348)), .Names = c("EndId", "dateperiod", "151101", "150501", 
"141101", "140501", "131101", "130501", "121101"), row.names = c(NA, 
-9L), class = "data.frame")

Ex disturbance data 2 (high_roads):

structure(list(EndId = 1:9, dateperiod = c(141101L, 141101L, 
141101L, 141101L, 141101L, 141101L, 141101L, 141101L, 141101L
), `151101` = c(806.415, 802.56, 502.35, 1234.2, 704.06, 685.23, 
682.495, 1002.3, 696.348), `150501` = c(710.211, 684.471, 676.831, 
762.955, 704.06, 802.56, 502.35, 1234.2, 696.348), `141101` = c(710.211, 
130.25, 453.25, 762.955, 704.06, 674.685, 682.495, 686.586, 696.348
), `140501` = c(710.211, 684.471, 802.56, 502.35, 1234.2, 674.685, 
682.495, 686.586, 696.348), `131101` = c(710.211, 684.471, 676.831, 
762.955, 704.06, 674.685, 502.35, 1234.2, 704.06), `130501` = c(710.211, 
684.471, 676.831, 762.955, 704.06, 674.685, 682.495, 686.586, 
696.348), `121101` = c(502.35, 1234.2, 704.06, 762.955, 704.06, 
674.685, 682.495, 686.586, 696.348)), .Names = c("EndId", "dateperiod", 
"151101", "150501", "141101", "140501", "131101", "130501", "121101"
), row.names = c(NA, -9L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000000006640788>)

So, for each EndId, I want it to look at the dateperiod, see that it is 141101 in this example, look at column "141101", extract the value, and put it in a new column. Within a loop that goes through low_roads and high_roads.

Thanks to some help (below), I have it working much faster than before, with this:

disturbancelist <- list(low_roads=low_roads, high_roads=high_roads) #Lists all the disturbance dataframes
for (d in disturbancelist){ 
  ##Create a column named by the current disturbance class
     Class<-d$Class[2] ##calls the disturbance type
  ##Merge basic data and each disturbance dateframe to get the right distance values
  mergeex<-merge(data_basic_DT, d, by.x = "EndId", by.y = "EndId", all.y == FALSE)
  mergeexdf<-as.data.frame(mergeex)
  col.names<-names(mergeexdf)
  mergeexdf$distance <- mergeexdf[cbind(1:nrow(mergeexdf), fmatch(mergeexdf$dateperiod, col.names))]
  names(data_basic_DT)[names(data_basic_DT)=="distance"] <- Class ##Change name of column to current disturbance class
  print(Class)
}

Now, I would like to change this code to work in data.tables to make it run faster. It works outside of the loop as data.tables, but not within. Any help appreciated!

Mel
  • 43
  • 6
  • 2
    This post is too long and lacking in reproducibility for me to read, but I think (based on Brian's answer), this might be related: http://stackoverflow.com/questions/33310179/select-values-from-different-columns-based-on-a-variable-containing-column-names – Frank Aug 16 '16 at 17:35
  • You need to provide sample data in a format that people can copy-paste in. (Use `dput` to get correct format with subset of data). Posting tab/space deliminated tables doesn't help anyone. You also can't post example code that refers to variables that aren't defined for people to actually execute your sample code. – Dean MacGregor Aug 16 '16 at 22:43
  • Thanks, I have edited the question to be more explicate, concise (hopefully), and provide reproducible examples. Note I also updated based on an answer provided below. – Mel Aug 17 '16 at 13:07

1 Answers1

0

If I understand you right, this sounds similar to a question I asked a while back: R data.frame get value from variable which is selected by another variable, vectorized. Although that question applies to data.frames in general, I think it is still a good solution for a data.table. EDIT: Maybe not, based on responses, but it does work well on data.frames at least.

The idea is to use match with the names attribute to get the numeric indices of the columns for each row, and then use that to get the value. Something like this, for a data.frame called df:

df$newvar <- df[cbind(1:nrow(df), match(df$dateperiod, names(df)))]

Where the first number 1:nrow(df) basically takes the place of your for loop and the second number match(df$dateperiod, names(df)) identifies the column whose name matches the value contained in dateperiod for each row. It works because match operates on the entire column vector df$dateperiod and returns a column of the same length.

Hope that helps.

Community
  • 1
  • 1
Brian Stamper
  • 2,143
  • 1
  • 18
  • 41
  • 2
    If you have a data.table you want to use the fast joins that package offers and not the rather slow (in comparison) base function match. – Roland Aug 16 '16 at 17:45
  • Thank you, I got that working nicely and quickly with dataframes but not data.tables. It messes up when calling in the different data.tables while looping, i.e. names(dt). I edited the question to reflect this improvement. – Mel Aug 17 '16 at 13:09