I'm trying to resolve the problem that is explored in this question and continues into this one.
I have a dataframe
of 32,285 observations with variables year
, epiweek
, count
. The dataframe is not "flat" because the count column is more than one for many cases, i.e., more than one observation is lumped into many rows. I've been through all the solutions for both pages and always get the error: invalid 'times' argument.
The solution on the second page ("Strange error when expanding data.table") seemed to be consensus on the best way to do it, and they they seem to have resolved it, so I created a small reprex, but the reprex works, no error:
year1 <- c('2010', '2011', '2010', '2012', '2010')
epiweek1 <- c(1, 1, 2, 2, 3)
count1 <- c(1, 5, 38, 13, 1)
mydt1 <- data.table(year1, epiweek1, count1)
newdt1 <- mydt1[ ,.(rep(rep(1,.N),count1)), by=.(year1,epiweek1)]
The large df is ordered by the epiweeks. All the observations in epiweek 1 for all years, then all the years in epiweek 2 for all years etc. The first reprex is consistent with the sequence in the df, as is the following one:
year2 <- c('2010', '2011', '2011', '2012', '2013')
epiweek2 <- c(1, 1, 2, 2, 3)
count2 <- c(1, 5, 38, 13, 1)
mydt2 <- data.table(year2, epiweek2, count2)
newdt2 <- mydt2[ ,.(rep(rep(1,.N),count2)), by=.(year2,epiweek2)]
Reprex 3 is reprex 1 with the one of the epiweeks out of sequence. It gives the error using the dt method to expand the datatable:
year3 <- c('2010', '2011', '2010', '2012', '2010')
epiweek3 <- c(1, 2, 1, 2, 3)
count3 <- c(1, 5, 38, 13, 1)
newdt3 <- mydt1[ ,.(rep(rep(1,.N),count3)), by=.(year3,epiweek3)]#Error in rep(rep(1, .N),
count3) : invalid 'times' argument
I also get the error if I use count1 in reprex 2. The counts are equal to one another so not sure why that happens.
count1 == count2, count1 == count3
All three are consistent with the piece of tabling code in the last comment on the second page by @vonjd:
mydt1[,.N,by=.(year1,epiweek1)][,table(N)]
mydt2[,.N,by=.(year2,epiweek2)][,table(N)]
mydt3[,.N,by=.(year3,epiweek3)][,table(N)]
So, my large df seems to be structured like reprex 1 and 2 which expand the table and do not give the error. The largest value in one count row is 166. So, I'm perplexed as to why the expanding data.table method is not working with my large dataframe. Yes, I did convert to data table first.
Using R version 3.6.1