1

I am creating a fictitious dataset that generates values duration (below) based on a known discrete distribution (basic MC sampling). Each duration is assigned to a sequential id number. A trivial example using rnorm() might look like the following:

set.seed(135813) # whimsical seed
id_dt <- data.table(id = 1:6) # Six "id" numbers
duration_dt <- data.table(duration = abs(rnorm(6, mean = 20, sd = 10))) # Sample of six arbitrary positive values
id_durs <- id_dt[, .(id = id, duration = round(duration_dt$duration))] # combine the above DTs; round values to ints

For each duration value in the id_durs data table, I need to express the value as a sum of ones - that is, assigning a value of one (mapped to the id and original duration) in new rows until the number of ones created equals the original value. In this example we would start with:

    id    duration
    --    --------
     1       7
     2      34
     3      33
     4       2
     5      40
     6      27

And the desired result is:

    id    duration    count
    --    --------    -----
     1       7          1
     1       7          1
     1       7          1
     1       7          1
     1       7          1
     1       7          1
     1       7          1      <== duration = 7, Rows = 7
     2      34          1
     2      34          1
     2      34          1
     2      34          1
     2      34          1
     2      34          1
     2      34          1
     2      34          1     
    ...    ...        ...     <== duration = 34, Rows = 34
     3      33          1     
    ...    ...        ...     <== duration = 33, Rows = 33
     4       2          1
     4       2          1     <== duration = 2, Rows = 2
     5      40          1
    ...    ...        ...     <== duration = 40, Rows = 40
     6      27          1
    ...    ...        ...     <== duration = 27, Rows = 27

One way I know to decompose a single value (verbose) is:

stuff = 50.4
decomp <- lapply(1:round(stuff), function(i) i <- 1)
result <- data.table(count = unlist(decomp))

But when trying to map this to id and original value, I'm hitting walls. I broke down and tried a for loop as a crutch. Applied to the above:

for (i in 1:length(id_durs))
     {
       id_dur_val <- data.table(id = id_durs$id, 
                                duration = id_durs$duration,  
                                count = rep(1, each = id_durs$duration[i]))
      }

But this just gives me a repetition equal to the number of elements in the original data. I also tried using expand.grid(), but only the first element (as expected) was used as the iterator - so all row counts were the same for each value of duration.

This feels like such a trivial problem, so I know I'm overlooking something.

Thank you for any advice.

phillipm
  • 11
  • 1

1 Answers1

1

You could do the following

library(data.table)

duration_dt[,.(duration,count = rep(numeric(.N) + 1,duration)),by = id]


   id duration count
  1:  1        7     1
  2:  1        7     1
  3:  1        7     1
  4:  1        7     1
  5:  1        7     1
 ---                  
139:  6       27     1
140:  6       27     1
141:  6       27     1
142:  6       27     1
143:  6       27     1
Sotos
  • 51,121
  • 6
  • 32
  • 66
Onyambu
  • 67,392
  • 3
  • 24
  • 53