0

I am new to data.table in R, and am running into an issue where some values of a column called "coverage" are not recognized. I have created the data table as follows:

dt <- as.data.table(expand.grid(coverage = c(seq(0, 0.9, 0.1), 0.99),
                            year     = seq(0, 15, 1)),
                cum_inf = numeric())

I then would like to fill in the cum_inf column by reading in .RData files, and pulling the appropriate information from them:

for(i in 1:length(files)) {
  load(files[i])
  model <- eval(parse(text = file_names[i]))
  cov   <- (model$param$perc_vaccinated*3*365)/(1 + model$param$perc_vaccinated*3*365)
  for(j in 0:15) {
    dt[coverage == cov & year == j, cum_inf := mean(sapply(model$popsumm[[1]], function(x) {
      if(j == 0) { 0 } else {
        sum(x[1]:x[(365/5)*j])
      }
    }))]
  }
  rm(list=ls(pattern="sens"))
}

However, coverage values of 0.3, 0.6, and 0.7 aren't recognized, and so the corresponding values of cum_inf are not filled in. As an example, if I type dt[coverage == 0.2], R prints to the console:

  coverage year cum_inf
 1:      0.2    0    0.00
 2:      0.2    1   16.05
 3:      0.2    2   20.40
 4:      0.2    3   11.50
 5:      0.2    4   17.45
 6:      0.2    5   11.25
 7:      0.2    6   14.70
 8:      0.2    7   10.90
 9:      0.2    8    8.35
10:      0.2    9    7.50
11:      0.2   10    5.90
12:      0.2   11    3.60
13:      0.2   12    4.50
14:      0.2   13    3.05
15:      0.2   14    4.70
16:      0.2   15    3.35

However, dt[coverage == 0.3] returns Empty data.table (0 rows) of 3 cols: coverage,year,cum_inf. I know that the fourth row of the data table has coverage value of 0.3, so I tried dt[4,] to see what value is stored for coverage of 0.3, and it looks like 0.3:

   coverage year cum_inf
1:      0.3    0      NA

Similarly, dt[coverage == dt[4, coverage]] prints to the console:

    coverage year cum_inf
 1:      0.3    0      NA
 2:      0.3    1      NA
 3:      0.3    2      NA
 4:      0.3    3      NA
 5:      0.3    4      NA
 6:      0.3    5      NA
 7:      0.3    6      NA
 8:      0.3    7      NA
 9:      0.3    8      NA
10:      0.3    9      NA
11:      0.3   10      NA
12:      0.3   11      NA
13:      0.3   12      NA
14:      0.3   13      NA
15:      0.3   14      NA
16:      0.3   15      NA

Any help in understanding why these three values in the coverage column are not recognized in the same way as other values is much appreciated.

kapeeb
  • 1
  • 1
    This is a question of numerical precision. A nice answer to this on SO is in [this post](https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal). You might also take a look at the help page of `?all.equal`. – lmo Jun 27 '17 at 18:59
  • Aha, so not related to data.table at all. Thank you! – kapeeb Jun 27 '17 at 19:24

1 Answers1

1

Rounding errors out at the 20th place or so:

print(dt$coverage,digits=20)
  [1] 0.00000000000000000 0.10000000000000001 0.20000000000000001 0.30000000000000004 0.40000000000000002 0.50000000000000000 0.60000000000000009
  [8] 0.70000000000000007 0.80000000000000004 0.90000000000000002 1.00000000000000000 0.00000000000000000 0.10000000000000001 0.20000000000000001

Round your coverage generation statement:

dt <- as.data.table(expand.grid(coverage = round(c(seq(0, 0.9, 0.1), .99),2),
                                year     = seq(0, 15, 1)),
                    cum_inf = numeric())

>dt[coverage==.3]

    coverage year
 1:      0.3    0
 2:      0.3    1
 3:      0.3    2
 4:      0.3    3
 5:      0.3    4
 6:      0.3    5
 7:      0.3    6
 8:      0.3    7
 9:      0.3    8
10:      0.3    9
11:      0.3   10
12:      0.3   11
13:      0.3   12
14:      0.3   13
15:      0.3   14
16:      0.3   15
akaDrHouse
  • 2,190
  • 2
  • 20
  • 29