1

I'm trying to resolve the problem that is explored in this question and continues into this one.

I have a dataframe of 32,285 observations with variables year, epiweek, count. The dataframe is not "flat" because the count column is more than one for many cases, i.e., more than one observation is lumped into many rows. I've been through all the solutions for both pages and always get the error: invalid 'times' argument.

The solution on the second page ("Strange error when expanding data.table") seemed to be consensus on the best way to do it, and they they seem to have resolved it, so I created a small reprex, but the reprex works, no error:

    year1 <- c('2010', '2011', '2010', '2012', '2010')
    epiweek1 <- c(1, 1, 2, 2, 3)
    count1 <- c(1, 5, 38, 13, 1)
    mydt1 <- data.table(year1, epiweek1, count1)
    newdt1 <- mydt1[ ,.(rep(rep(1,.N),count1)), by=.(year1,epiweek1)]

The large df is ordered by the epiweeks. All the observations in epiweek 1 for all years, then all the years in epiweek 2 for all years etc. The first reprex is consistent with the sequence in the df, as is the following one:

    year2 <- c('2010', '2011', '2011', '2012', '2013')
    epiweek2 <- c(1, 1, 2, 2, 3)
    count2 <- c(1, 5, 38, 13, 1)
    mydt2 <- data.table(year2, epiweek2, count2)
    newdt2 <- mydt2[ ,.(rep(rep(1,.N),count2)), by=.(year2,epiweek2)]

Reprex 3 is reprex 1 with the one of the epiweeks out of sequence. It gives the error using the dt method to expand the datatable:

    year3 <- c('2010', '2011', '2010', '2012', '2010')
    epiweek3 <- c(1, 2, 1, 2, 3)
    count3 <- c(1, 5, 38, 13, 1)
    newdt3 <- mydt1[ ,.(rep(rep(1,.N),count3)), by=.(year3,epiweek3)]#Error in rep(rep(1, .N), 
    count3) : invalid 'times' argument

I also get the error if I use count1 in reprex 2. The counts are equal to one another so not sure why that happens.

    count1 == count2, count1 == count3

All three are consistent with the piece of tabling code in the last comment on the second page by @vonjd:

    mydt1[,.N,by=.(year1,epiweek1)][,table(N)]
    mydt2[,.N,by=.(year2,epiweek2)][,table(N)]
    mydt3[,.N,by=.(year3,epiweek3)][,table(N)]

So, my large df seems to be structured like reprex 1 and 2 which expand the table and do not give the error. The largest value in one count row is 166. So, I'm perplexed as to why the expanding data.table method is not working with my large dataframe. Yes, I did convert to data table first.

Using R version 3.6.1

Sandeep Patel
  • 4,815
  • 3
  • 21
  • 37
JuanTamad
  • 65
  • 9
  • within the scope of `mydt1`, there is no year3, epiweek3 and count3. `rep` is expecting a `times` to be of the same length as `rep(1, .N)`, and for the first group, instead of passing in length 1, `count3` (from the global environment) is of length 5, causing the error. – chinsoon12 Dec 23 '19 at 01:02
  • Oh, missed that. So that one is ok too when mydt1 is replaced with mydt3, and still no clue as to why my large dataset isn't working. I can't create a reprex. – JuanTamad Dec 23 '19 at 07:44
  • The lengths are the same in the large dataset - 32285 – JuanTamad Dec 23 '19 at 07:57

1 Answers1

0

This produced exactly what I needed:

dt.0742long <- dt.0742[ , 
  .(tryCatch(rep(rep(1,.N),count), error=browser)), by=.(year,epiwk)]

V1 was count

> glimpse(dt.0742long)
Observations: 113,898 Variables: 3 $ year  <dbl> 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 201… $ epiwk <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, … $ V1    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, … >


> glimpse(dt.0742)
Observations: 32,285
Variables: 4
$ year          <dbl> 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,…
$ epiwk         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ count         <dbl> 1, 2, 2, 1, 1, 15, 14, 12, 15, 8, 4, 5, 11, 7, 3, 4, 2, 1, 2, 4, 1, 1, 1, 2, …
$ Case_Category <dbl> 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
Frank 2
  • 581
  • 2
  • 8