1

I have this code:

df[, -1] = apply(df[, -1], 2, function(x){x * log(x)})

df looks like:

sample a b  c
a2     2 1  2
a3     3 0 45

The problem I am having is that some of my values in df are 0. You cannot take the ln(0). So I would like tell my program to spit out a 0 if it tries to take ln(0).

alistaire
  • 42,459
  • 4
  • 77
  • 117
Jennifer
  • 69
  • 5
  • 2
    One option is to add an amount less than floating point error to `x` so it won't noticeably change the results but will run fine, e.g.: `df[-1] <- lapply(df[-1], function(x){x * log(x + .Machine$double.xmin)})` – alistaire Jan 08 '18 at 02:22
  • "So I would like tell my program to spit out a 0 if it tries to take ln(0)." So, you want to get a wrong result from an arithmetic operation? That sounds dangerous. – Roland Jan 08 '18 at 06:59

2 Answers2

2

You could use ifelse here:

df[,-1] = apply(df[,-1], 2, function(x){ ifelse(x != 0, x*log(x), 0) })
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

You can take advantage of floating point error to add a tiny amount less than the floating point error to x. Since log(0.00000000000000...0000223) is 0.0000..., inputting 0 will work. The results of other numbers will only be changed by amounts smaller than the floating point error, meaning for practical purposes not at all.

Avoiding the iteration and using .Machine$double.xmin for a very, very small number,

df <- data.frame(sample = c("a2", "a3"), 
                 a = 2:3, 
                 b = c(1L, 0L), 
                 c = c(2L, 45L))

df
#>   sample a b  c
#> 1     a2 2 1  2
#> 2     a3 3 0 45

df[-1] <- df[-1] * log(df[-1] + .Machine$double.xmin)

df
#>   sample        a b          c
#> 1     a2 1.386294 0   1.386294
#> 2     a3 3.295837 0 171.299812

To check the results, let's use another approach, changing 0 values to 1 so they're return 0:

df2 <- data.frame(sample = c("a2", "a3"), 
                 a = 2:3, 
                 b = c(1L, 0), 
                 c = c(2L, 45L))

df2[df2 == 0] <- 1
df2[-1] <- df2[-1] * log(df2[-1])

df2
#>   sample        a b          c
#> 1     a2 1.386294 0   1.386294
#> 2     a3 3.295837 0 171.299812

Because the change is less than floating point error, the results are identical according to R:

identical(df, df2)
#> [1] TRUE
alistaire
  • 42,459
  • 4
  • 77
  • 117