Indexing in r, problems with some points

Question

I created functions dyst and dystryb:

dyst<- function(t,x)
{
  f<-1
  return(f)
}
dystryb<- function(x)
{
  x<-sort(x)
  s<- numeric(101)
  u<-seq(0,1, by = 0.01)
  for (t in u)
  {
    s[t*100+1]<-dyst(t,x)
  }
  return(s)
}

After calling function dystryb I get this:

> x<-c(1,2,3,4,5,6,7)
> dystryb(x)
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [51] 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[101] 1

Why isn't this function working for argument 30 and 59 ? Of course it is not about making a functions, which make vector of "1", but I wanted to make it clear, where the problem is.

i believe it has got to do with how 0.3 is being stored internally. Since it's recurring number, multiplying by 100 varies it by a little. Maybe. just a thought — joel.wilson, Jan 03 '17 at 09:58

score 1 · Answer 1 · edited May 23 '17 at 12:33

The root cause is numerical precision. See this SO post for an R-related discussion. The links the @Dirk-eddelbuettel includes provide a background both to R and one of the most relevant papers covering numerical precision in computing in general. This post provides a more detailed general answer on SO related to the computer science behind this issue.

To show that the root cause is numerical precision, consider the sequence you've created. First, the default print out of the sequence.

print(seq(0,1, by = 0.01) * 100 + 1)
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
 [20]  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38
 [39]  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57
 [58]  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76
 [77]  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95
 [96]  96  97  98  99 100 101

Everything looks good. Now, print out your sequence telling R to show 16 digits.

print(seq(0,1, by = 0.01) * 100 + 1, digits=16)
  [1]   1.000000000000000   2.000000000000000   3.000000000000000
  [4]   4.000000000000000   5.000000000000000   6.000000000000000
                                  ...
 [25]  25.000000000000000  26.000000000000000  27.000000000000000
 [28]  28.000000000000000  29.000000000000004  29.999999999999996
 [31]  31.000000000000000  32.000000000000000  33.000000000000000
 [34]  34.000000000000000  35.000000000000000  36.000000000000000
 [37]  37.000000000000000  38.000000000000000  39.000000000000000
 [40]  40.000000000000000  41.000000000000000  42.000000000000000
 [43]  43.000000000000000  44.000000000000000  45.000000000000000
 [46]  46.000000000000000  47.000000000000000  48.000000000000000
 [49]  49.000000000000000  50.000000000000000  51.000000000000000
 [52]  52.000000000000000  53.000000000000000  54.000000000000000
 [55]  55.000000000000000  56.000000000000007  57.000000000000007
 [58]  58.000000000000007  58.999999999999993  60.000000000000000
                               ...
[100] 100.000000000000000 101.000000000000000

You see that '30' stored the value of 29.999999999999996 and '59' stores the value of 58.999999999999993. Now, if we cast this sequence as an integer, we get the following output.

print(as.integer(seq(0,1, by = 0.01) * 100 + 1))
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
 [20]  20  21  22  23  24  25  26  27  28  29  29  31  32  33  34  35  36  37  38
 [39]  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57
 [58]  58  58  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76
 [77]  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95
 [96]  96  97  98  99 100 101

This coercion function translated 29.999999999999996 to 29 and 58.999999999999993 to 58, essentially performing a truncation. So, in your code, the 29th and 58th elements are referenced twice, while the 30th and 59th elements are not referenced at all.

In this situation, the output is identical to using the floor function.

identical(trunc(seq(0,1, by = 0.01) * 100 + 1), floor(seq(0,1, by = 0.01) * 100 + 1))
[1] TRUE

One solution to your particular problem is to use round before casting the sequence to integer.

identical(1:101, as.integer(round(seq(0,1, by = 0.01) * 100 + 1)))
[1] TRUE

This solution was really helpful. I thought about numerical precision, but I couldn't figure out, how to improve this. Thank you a lot! — Aga, Jan 04 '17 at 14:11
Sure thing, numerical precision issues can show up in unexpected ways. — lmo, Jan 04 '17 at 14:15

score 1 · Answer 2 · answered Jan 03 '17 at 13:09

1

The following shows exactly what happened, you will have zeros at the locations 15, 29,... because of floating point precision error.

which(seq(0,1, by = 0.01)*100+1 != 1:101)
# [1] 15 29 30 56 57 58 59

answered Jan 03 '17 at 13:09

Sandipan Dey

21,482
2
51
63

Indexing in r, problems with some points

2 Answers2