0

I have a data frame with a column that contains percentage values in double format which I want to transform to an integer format. For example, 33% is currently represented as 0.33 and I want to change this to 33.

I have built a helper function that looks like this:

  transform <- function(x) {
    x <- x * 100
    as.integer(x)
    }

Now, if I run this, my rows with the value 0.33 get turned into 32 instead of 33. I have separated the multiplication to a line before as.integer(x), which correctly transforms my values to 33 (if I comment out as.integer(x)). But if I run the whole function, the values get turned into 32.

Any ideas why this might happen and how to fix it? To my understanding, this has nothing to do with truncation of digits behind the dot, because the values are already correctly transformed to 33.

  • 2
    I can't reproduce this error... When I copy your function exactly as stated and try `transform(0.33)`, it returns 33. That said, when I try `transform(0.3299)`, I get `0.32`, because `as.integer()` takes a `floor()` instead of rounding to the nearest integer. Is that the actual underlying problem, perhaps? – Aaron Montgomery Jan 14 '22 at 17:45
  • I'm sure this is a duplicate, but the underlying problem is that floating point format can't represent 0.33 exactly. It will display a range of numbers as 0.33, but none of them are exactly equal to it. Yours must be some variation on 0.32999999999. – user2554330 Jan 14 '22 at 17:49
  • Duplicate: https://stackoverflow.com/q/9508518/6574038 – jay.sf Jan 14 '22 at 17:51
  • It also works fine for me if I use it with individual input, but not with my larger data set (~6000000 rows). Values in the data set all are either 0, 0.33, 0.67, or 1; this problem only occurs with the 0.33 values, but with all of them. – Alarith Uhde Jan 14 '22 at 17:53
  • As others have stated: whatever process rendered your larger data frame has actually *not* stored values of 0.33, but something like 0.3299999999999999 or something similar. R will always display this as 0.33, but it's why you have difficulty reproducing the error directly with an "individual input." The solution would be to explicitly call `round()`. – Aaron Montgomery Jan 14 '22 at 17:56
  • @user2554330 Thanks, I was not aware of that problem. In my specific case where I only have the four values mentioned above, I could fix this by changing the line `x <- x * 100` to `x <- (x * 100) + 0.1`. – Alarith Uhde Jan 14 '22 at 18:03
  • 1
    This fixes your specific problem for this dataset, but I'd still recommend `round()` as a more robust solution; note that your fix would incorrectly transform something like `0.328` to `32` instead of `33`. (Again, I get that this may not be an actual problem on your dataset.) – Aaron Montgomery Jan 14 '22 at 18:09

1 Answers1

2

Note 1: as @jay.sf pointed out in the comments, transform() is already a pre-existing function in base R, and it's not great practice to overwrite it. I recommend changing the name of your helper function and adjusting all the calls to it.


I suspect the underlying culprit is actually those latter digits; I recommend explicitly round()ing, i.e.

my_transform <- function(x){
  x <- round(x * 100, 0)
  as.integer(x)
}

so that you're not relying on as.integer() to do something it's not intended to do (rounding, in this case).

Aaron Montgomery
  • 1,387
  • 8
  • 11