Why does rep produce one less for some value than the vector states in R?

Question

I am attempting to repeat a sequence using another vector which contains the number of times to repeat in a data.frame. The original vector was provided to me which had been multiplied by 0.9. I thus divided by 0.9 to give what appears to be integers when I print it out

> release$actrel
  [1]    3    8    1    3    7  232  273   18    1   12   10    3    1   23  111
 [16]   57   43  578  546   14    2    2    9   97  506  361 1486  661   46  268
 [31]   27    3    3    1   11  187  758  168  382 1300  633   64    4    2  354
 [46]  813   82  147 1710 9331  728   31   46   64   21    1  255 1624 1744  772
 [61] 3702 3933   75    2    1    1   35  917 2881 1642  746  981  133    5    5
 [76]   27    5  168  575  184  136  533  345   38   21   26   12    6   14   21
 [91]  100  300   32    2   93 1116   49  235  356   54  178  203   24   13    9
[106]    4    7    4    9   19 1045  952  273   72 3512 1892   10    1    3    5
[121]    2

However, when I attempt to repeat using the above vector the length of the vector does not add up to the sum of the values.

> sum(release$actrel)
[1] 54392
> tg=rep(release$tg,release$actrel)
> length(tg)
[1] 54372

The first observed error in reproducing the correct length in the vector was for the 6th number. But this can be corrected if I apply the ceiling function to this value

> release$actrel[6]
[1] 232
> length(rep(release$tg[6],release$actrel[6]))
[1] 231
> length(rep(release$tg[6],ceiling(release$actrel[6])))
[1] 232

I thought that it might somehow relate to the index value in the vector and found the offending indices and the associated values

> c(6,15,18,25,26,27,38,46,51,55,59,60,76,78,86,90,99,101,110,112)
 [1]   6  15  18  25  26  27  38  46  51  55  59  60  76  78  86  90  99 101 110
[20] 112
> release$actrel[c(6,15,18,25,26,27,38,46,51,55,59,60,76,78,86,90,99,101,110,112)]
 [1]  232  111  578  506  361 1486  168  813  728   21 1744  772   27  168   26
[16]   21  356  178   19  952

I could not find any pattern in the index and thought that it might be due to the value of the actrel, but there were three instances of the value 21 but only 2 of them had an incorrect number of repeats produced. Therefore, I have been unable to figure out what is causing this unexpected functionality in the rep function. I think it is some sort of floating point issue but it doesn't seem to act in a predictable manner and is difficult to reproduce.

> test=c(232,111,578,506,361,1486,168,813,728, 21, 1744,772, 27,168, 26, 21,356,178, 19,952)
> test=test*.9
> test=test/.9
> length(rep(1:20,test))
[1] 9267
> sum(test)
[1] 9267
> test=test*.333
> test=test/.333
> length(rep(1:20,test))
[1] 9266
> sum(test)
[1] 9267

In order for rep to produce the correct number for my example I needed to round the values that were indicating how many times to repeat the value.

> sum(release$actrel)
[1] 54392
> tg=rep(release$tg,ceiling(release$actrel))
> length(tg)
[1] 54396
> tg=rep(release$tg,round(release$actrel))
> length(tg)
> [1] 54392

To me this seems like a bug in R but I am not sure where to post this other than here. Below is the session information if it is helpful for attempting to reproduce these errors.

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggmap_3.0.0             rgeos_0.5-8             sp_1.4-5               
 [4] mapdata_2.3.0           maps_3.3.0              stringr_1.4.0          
 [7] data.table_1.14.0       magrittr_2.0.1          tidyr_1.1.3            
[10] ggplot2_3.3.5           dplyr_1.0.6             R4MFCL_0.4.2.2019.06.25
[13] frqit_0.0.1             FLR4MFCL_1.2.7          FLCore_2.6.16          
[16] iterators_1.0.13        lattice_0.20-44        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6          plyr_1.8.6          pillar_1.6.1       
 [4] compiler_4.1.1      bitops_1.0-7        tools_4.1.1        
 [7] lifecycle_1.0.0     tibble_3.1.2        gtable_0.3.0       
[10] png_0.1-7           pkgconfig_2.0.3     rlang_0.4.11       
[13] Matrix_1.3-4        httr_1.4.2          withr_2.4.2        
[16] RgoogleMaps_1.4.5.3 generics_0.1.0      vctrs_0.3.8        
[19] stats4_4.1.1        grid_4.1.1          tidyselect_1.1.1   
[22] glue_1.4.2          R6_2.5.0            jpeg_0.1-8.1       
[25] fansi_0.4.2         farver_2.1.0        purrr_0.3.4        
[28] scales_1.1.1        ellipsis_0.3.2      MASS_7.3-54        
[31] colorspace_2.0-1    geosphere_1.5-14    utf8_1.2.1         
[34] stringi_1.6.2       munsell_0.5.0       rjson_0.2.20       
[37] crayon_1.4.1

Sometimes, the precision is not showed. Try checking `print(release$actrel, digits = 16)` — akrun, Dec 17 '21 at 16:33
I see what you are saying it does show some values as .99999999999999. However, I would expect that such small floating point precision would be accounted for by the function since performing even simple calculations on a set of numbers will produce them print(29/7*7,digits=22) [1] 29.000000000000004 — user3653085, Dec 17 '21 at 16:47

Andy Baxter · Answer 1 · 2021-12-17T16:44:53.963

1

Rather than letting them appear to be integers, it would probably be better to convert: as.integer(round(release$actrel)). R wont 'complete' a repeat if the number is less than the rounded value. ceiling would round up even if there was a miniscule decimal after as well.

x <- c(1.999999999, 2.00000001)
# Values "look" like round integers, but aren't
x
#> [1] 2 2

# Incorrect output 
rep(1:2, x)
#> [1] 1 2 2

# Incorrect output
rep(1:2, as.integer(ceiling(x)))
#> [1] 1 1 2 2 2

# Correct output
rep(1:2, as.integer(round(x)))
#> [1] 1 1 2 2

^{Created on 2021-12-17 by the reprex package (v2.0.1)}

edited Dec 17 '21 at 16:44

answered Dec 17 '21 at 16:39

Andy Baxter

5,833
1
8
22

To clarify - `round` should do all the work for you, but may as well be explicit that for a count of repeats you'd be specifically expecting an integer. – Andy Baxter Dec 17 '21 at 16:41
I guess I am just expecting that this would occur internally to the function. Therefore, people won't have to waste hours trying to figure out why the function doesn't work as expected or produces bad results if people don't notice that such a tiny floating point precision has caused a logical error. – user3653085 Dec 17 '21 at 16:56
@user3653085: I understand that this is frustrating, but you should know that dealing with floating point values opens a huge can of worms about what should be done to satisfy user expectations/be explicit. You could say that `seq()` ought to round automatically ... just know that this question and many others have been bashed around over decades by R users and designers ... – Ben Bolker Dec 17 '21 at 17:01
Yes indeed it seems odd, but I would suggest that the better solution would be to not accept floating point values at all. The docs for `rep` specify that arguments for times/each/length.out should be integers, but perhaps for convenience adding "Non-integer values of times will be truncated towards zero." (https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/rep) – Andy Baxter Dec 17 '21 at 17:37

Why does rep produce one less for some value than the vector states in R?

1 Answers1