0

I have a data table (call it df) which contains a number of columns (variables) which represent data values for an observation (a row) at each point of time (one variable = one point of time), for simplicity, V1 - value at time=1, V5 - value at time=5 etc.

Artificial df example:

df <- data.table( 
id = seq_len(4),  
start = c(2, 3, 1, 3),  
length = c(2, 3, 4, 1),  
v1 = rep(0.9), v2 = rep(0.8), v3 = rep(0.7), v4 = rep(0.6), v5 = rep(0.5))

Additionally, this table contains another 2 variables: start time of interest and length of time interval, counting from start time, let's call them start and length.

From this, I want to create a new table (let's call it newdf), according to the following principle: If for 1st row of df, start = 2, length = 2, then first row of newdf would contain values from V2 and V3, all other variables in newdf being empty. If for second row of df, start = 3, length = 3, then newdf would contain values of V3, V4, V5 in the 2nd row, etc

My desired output would look like this (constructed manually)

newdf <- data.table(
id = seq_len(4),
t1 = c(0.8, 0.7, 0.9, 0.7),
t2 = c(0.7, 0.6, 0.8, NA),
t3 = c(NA, 0.5, 0.7, NA),
t4 = c(NA, NA, 0.6, NA))

Basically while df contains absolute time measurement values for each observation, newdf would contain the same measurements, but for relative time in each observation.

Obvious and ineffective way is constructing a new data table by manually looping over rows, but I really want to solve it within the data table environment. A code of

newdf <- df[, .SD, by=id, .SDcols=(column numbers I need)]

comes close to what I want (especially if putting it into the loop), except that I can't figure out how to extract and pass the ever changing (over each row of df) column numbers to SDcols without looping, if that's even possible at all? (even ignoring the fact that selected number of columns might vary)

The closest I've come somebody has asked something like this, is here: Selecting different numbers of columns on each row of a data frame, but the outcomes are not really what I want here.

I've tried also creating a selection function and then running lapply in j for the .SD object, kinda

newdf <- df[, (new.names):=lapply(.SD, fff) , by=id, .SDcols=-1]

but again arriving at the same problem that I cannot extract values from start and length variables from df, which then to use for column selection.

bobbers
  • 1
  • 1
  • "call it df" -- Instead, you should *show* us a table that illustrates the problem. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 for guidance. – Frank Nov 08 '17 at 15:26
  • 1
    Thanks, edited with input and output examples. – bobbers Nov 08 '17 at 16:51

0 Answers0