Storing vectors in a dataframe element

Question

I am trying to store vector of variable length in a new column of an existing dataframe.

My initial dataframe-

data - 
job_id usetime
abc    2345
abc1   4353
jsdf   34985

I have a numeric vector(indices_excluded) containing the indices from the dataframe. Using these numbers i have to extract the usetimes of the corresponding indices.

I want to store all of these usetimes corresponding to the row indices in a new column called "runtime_excluded"

To do so i tried running this code in a loop (applied on i)
data[i, "runtime_excluded"]<-I(list(data[indices_excluded, "USETIME"]))
The numeric vector "indices_excluded" keeps changing on each iteration.

This is give me a warning saying

value = list( : replacement element 1 has 2 rows to replace 1 rows

It is storing only the first element of the list.
I want to store all the usetimes in that dataframe element.
Desired output-

data - 
job_id   usetime   runtime_excluded
abc      2345      234,4325
abcd     4353      2435
abcde     34985     2134, 234234, 34223

I came across a few relevant questions like one,two,three but could find an answer to my problem.

EDIT-

My initial dataframe-

data - 
job_id starttime  endtime  endtime_modified  usetime
abc    1          23       20                22
abc1   2          15       13                13
jsdf   30         40       39                10

The code that im running -

k=nrow(data)
for(i in 1:k)
{
        indices_peak<-which((data[i,"endtime"] >= data$starttime) 
                             & (data[i,"endtime"] <= data$endtime)

        indices_peak95<-which((data[i,"endtime_modified"] >= data$starttime) 
                               & (data[i,"endtime_modified"] <= data$endtime_modified)

        indices_excluded<-indices_peak[!indices_peak %in% indices_peak95]
        data[i,"peak"]<-length(indices_peak)
        data[i,"peak_95"]<-length(indices_peak95)
        data$runtime_excluded[i]<-data[indices_excluded, "USETIME"]

}

Desired output-

job_id starttime  endtime  endtime_modified  usetime  peak  peak_95  runtime_excluded
abc    1          24       22                22       2     2       20
abc1   2          24       20                22       2     3       -
jsdf   3          23       23                 9       3     1       22,20

Starttimes and endtimes are in seconds and are in referene to a particular time.

Can you show the head of the data frame that includes the values you want to capture in `runtime_excluded`, or of an example that mimics its structure? — ulfelder, Aug 18 '15 at 17:31
Hi @ulfelder the values will be vector of values from the column usetime. The length of this vector is variable for each row(i.e. "i") depends on the the vector "indices_excluded". The length could be anywhere from 0 to 20. Please let me know if this is not clear enough. — JAS, Aug 18 '15 at 17:57

score 0 · Accepted Answer · edited May 23 '17 at 10:26

Not sure if I understood you correctly, anyway, here's an example very similar to the one suggested here :

# your initial data.frame 
data <- data.frame(job_id = c('abc','abc1','jsdf'), usetime = c(2345,4353,34985))

# initialize runtime_excluded with an empty list
data$runtime_excluded <- vector(mode = "list",length=nrow(data)) 

# > data
#   job_id usetime runtime_excluded
# 1    abc    2345             NULL
# 2   abc1    4353             NULL
# 3   jsdf   34985             NULL

# example of initialization in a for-loop
for(i in 1:3){
  data$runtime_excluded[[i]] <- 1:i
  # or, similarly :
  # data[['runtime_excluded']][[i]] <- 1:i
}

# > data
#   job_id usetime runtime_excluded
# 1    abc    2345                1
# 2   abc1    4353             1, 2
# 3   jsdf   34985          1, 2, 3

EDIT :

Here's a working version of your code :

data <- data.frame(job_id = c('abc','abc1','jsdf'), 
                   starttime = c(1,2,3),
                   endtime = c(24,24,23),
                   endtime_modified = c(22,20,23),
                   usetime = c(22,22,9)
                   )
# > data
#   job_id starttime endtime endtime_modified usetime
# 1    abc         1      24               22      22
# 2   abc1         2      24               20      22
# 3   jsdf         3      23               23       9


# initialize runtime_excluded with an empty list
data$runtime_excluded <- vector(mode = "list",length=nrow(data)) 

k=nrow(data)
for(i in 1:k)
{
  indices_peak<-which((data[i,"endtime"] >= data$starttime) & (data[i,"endtime"] <= data$endtime))
  indices_peak95<-which((data[i,"endtime_modified"] >= data$starttime) & (data[i,"endtime_modified"] <= data$endtime_modified))

  indices_excluded<-indices_peak[!indices_peak %in% indices_peak95]
  data[i,"peak"]<-length(indices_peak)
  data[i,"peak_95"]<-length(indices_peak95)
  vect <- data[indices_excluded, "usetime"] # here's the integer(0) problem, solved using the if-statement below
  if(!is.null(vect)){
    data$runtime_excluded[[i]] <- vect
  }
}

# > data
# job_id starttime endtime endtime_modified usetime runtime_excluded peak peak_95
# 1    abc         1      24               22      22               22    2       2
# 2   abc1         2      24               20      22                     2       3
# 3   jsdf         3      23               23       9           22, 22    3       1

seems to be working fine.. except for the integer(0) when the vector length is 0. Also the the values are being populated as c(133,345,465). Can i remove both these cases ? Also could you please explain the logic behind the answer you provided? — JAS, Aug 18 '15 at 18:06
Mmh, to be honest I am a bit confused... I think my code is pretty self-explanatory, I mean, once you create `runtime_excluded` column, you can easily manipulate each single vector inside this column in this way: `data$runtime_excluded[[rowIndex]] <- newVector` as shown in my code... also, `integer(0)` works fine... please, provide a [small complete and reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) (modify the original post) to allow us to understand your problem — digEmAll, Aug 18 '15 at 19:38
I came up with a different method `data[i, "runtime_excluded"]<- paste(data[indices_excluded, "USETIME"], collapse=",")` This seems to be working fine even for the rows where indices are null ! — JAS, Aug 19 '15 at 11:07
@RakeshJasti: please note that in that way, you are not storing a vector inside a data.frame cell, but you are storing a string like `"2345,546,789"` so this is different from your initial question... — digEmAll, Aug 19 '15 at 12:25
@RakeshJasti: anyway, I modified your code using my strategy if you want to check — digEmAll, Aug 19 '15 at 12:42
Could you please explain why it is [[rowIndex]] instead of [rowIndex] ? — JAS, Aug 19 '15 at 13:29
Because runtime_excluded column is a list, not an atomic vector (like the other columns) and in R to access elements of a list you use double square brakets `[[`. — digEmAll, Aug 19 '15 at 13:40

score 0 · Answer 2 · answered Aug 19 '15 at 11:14

0

This worked out for me.
data[i, "runtime_excluded"]<- paste(data[indices_excluded, "USETIME"], collapse=",")

answered Aug 19 '15 at 11:14

JAS

25
3
6

Storing vectors in a dataframe element

2 Answers2