0

I have a dataset of characters including NULL values as "NA". To calculate the mean value should I omit NA or replace NA with 0?

Someone encouraged me to omit NULL values rather replace with 0.My question is why we shouldn't replace with 0?

OmitNA<-na.omit(oswego$age) # The NA value is omitted and remaining data is stored in OmitNA
AsNum<-as.numeric(OmitNA) # OmitNA from previous step is stored as numberic in AsNum
print(mean(AsNum)) # Mean is calculated

I need a reason why I shouldn't replace NULL with 0.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
  • 4
    Because it can affect the `mean` `v1 <- c(1, 2, 2, 0); v2 <- c(1, 2, 2, NA);mean(v1) [1] 1.25 > mean(v2, na.rm = TRUE)# [1] 1.666667` In the second case, `NA` element is removed from calculating `sum(na.omit(v2))/length(na.omit(v2))# [1] 1.666667` – akrun Jun 04 '19 at 05:00
  • 1
    It's worth mentioning than `NULL` and `NA` are [two different things](https://stackoverflow.com/questions/15496361/what-is-the-difference-between-nan-and-inf-and-null-and-na-in-r). –  Jun 04 '19 at 05:18

0 Answers0