0

My dataframe has few values as 9.969210e+36, I want to replace them with NA. It looks like

# A tibble: 1,308 x 3
       IMD     CRU dts       
     <dbl>   <dbl> <date>    
 1 9.97e36 9.97e36 1901-01-01
 2 9.97e36 9.97e36 1902-01-01
 3 9.97e36 9.97e36 1903-01-01
 4 9.97e36 9.97e36 1904-01-01 

dput(head(df))

structure(list(IMD = c(9.96920996838687e+36, 9.96920996838687e+36, 
9.96920996838687e+36, 9.96920996838687e+36, 9.96920996838687e+36, 
9.96920996838687e+36), CRU = c(9.96920996838687e+36, 9.96920996838687e+36, 
9.96920996838687e+36, 9.96920996838687e+36, 9.96920996838687e+36, 
9.96920996838687e+36), dts = structure(c(-25202, -24837, -24472, 
-24107, -23741, -23376), class = "Date")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -6L))

I followed R - Replace specific value contents with NA as

df %>% mutate_at(vars(IMD, CRU), na_if, 9.969210e+36)
df %>% na_if(x=as.vector(df$IMD),y=9.97e36)

None of the above shows NA values and returns same old dataframe. Any help is appreciated

Johan
  • 3,667
  • 6
  • 20
  • 25
arijeet
  • 31
  • 1
  • 5
  • 2
    Please provide at least some excerpt of the data. – hannes101 Feb 15 '19 at 06:45
  • In sharing an excerpt of the data, it will be helpful to share the output of `dput(head(df))` so that we can see the exact way the data is encoded, not just how it appears in the tibble representation in your question. – Jon Spring Feb 15 '19 at 06:58
  • 2
    It could be the issue as mentioned [here](https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal) – akrun Feb 15 '19 at 07:14

1 Answers1

1

na_if works on vectors, not data.frame, thus your first attempt using mutate would be most correct. Furthermore, it compares exact values to replace with NA. However, your very large values are only displayed with 15 digits; I suspect there are a lot, lot more. Therefore, no values are matched exactly to your conditional (y). This is a common problem when trying to exactly compare to real values.

Also note that you are trying to compare the two values. Which is largest?

9.969210e+36
9.96920996838687e+36

You can do it quickly by:

df %>%> mutate(
  IMD=ifelse(IMD > 9e36, NA, IMD),
  CRU=ifelse(CRU > 9e36, NA, CRU)
)

or create a function as,

na_when_larger <- function(x, y) {
  x[x > y] <- NA
  x
}

df %>% mutate_at(vars(IMD, CRU), na_when_larger, 9.96e+36)

(try typing na_if into the console without parenthesis).

MrGumble
  • 5,631
  • 1
  • 18
  • 33