2

I'd like a write a loop in R which checks the the dataframe in order to change 0 values into a median. Unfortunately I get an error.

It's just a part of my loop.

y <- median(df[1])
Error in median.default(df[1]) : need numeric data

If I use, it works.

y <- median(df$name_of_the_column)

this is my loop. I haven't finished the loop. its still in process.

i = 1
for (x in df) {
  if (df[i][df[i] == 0]) {
    df[i][df[i]] <- median(df[i])
  }
  • 1
    Try `df[,1]` or `d[[1]]` as `df[1]` is still a data.frame with one column where as `[,1]` or `[[1]]` extracts the column as a vector as do `$` and `median` works on `vector` as input – akrun Dec 18 '19 at 19:28
  • Thanks for the explanation. Its works. Now i have to work on my loop. This isn't working at all ;). –  Dec 18 '19 at 19:32
  • I posted a solution below with some explanations and a `for` loop – akrun Dec 18 '19 at 19:38
  • 1
    This may be relevant https://stackoverflow.com/questions/25835643/replace-missing-values-with-column-mean – M-- Dec 18 '19 at 19:40

2 Answers2

2

It can be easily done with na.aggregate after replacing the 0s with NA. By default, the na.aggregate loops on each column and replace the NA with the median of that column

library(zoo)
na.aggregate(replace(df, df == 0, NA), FUN = median)

If we need a loop (here we are excluding the 0 while calculating the median)

for(i in seq_along(df)) {
  df[[i]] <-  replace(df[[i]], df[[i]] == 0, median(df[[i]][df[[i]] !=0]))
 }

Issue in the OP's post is based on applying the median on a data.frame as median expects input as vector. According to ?median

x - an object for which a method has been defined, or a numeric vector containing the values whose median is to be computed.

We can either use df[,1] or df[[1]] to extract the column as a vector and then apply the median to get the same behavior as $ (assuming that 'df' is data.frame)

akrun
  • 874,273
  • 37
  • 540
  • 662
0

The answer depends on if you want 0s as part of your median calculation or not. Here are two tidyverse-based solutions, one that first converts the 0s to missing so they aren't part of the median calculation and another that includes them in the calculation (which is what it appears you have been doing):

library(tidyverse)
library(dplyr)

df<-
tibble(
  a=c(0:9),
  b=c(0:9),
  c=c(-2:7)
)

Converts 0s to NA then median:

df%>%
  mutate_all(
    list(~ifelse(.==0,NA,.))
  )%>%
  mutate_all(
    list(~ifelse(is.na(.),median(.,na.rm=TRUE),.))
  )

Converts 0s directly to median:

df%>%
  mutate_all(
    list(~ifelse(.==0,median(.,na.rm=TRUE),.))
  )
costebk08
  • 1,299
  • 4
  • 17
  • 42