0

I want to write a for loop in R to replace NA values from one column of my dataframe and replace them for the mean of the values of the same column when 2 conditions are true. When conditions are met, I want to assign the mean to NAs using observations from the same year and from the same group. I wrote the following code, but I am struggling to write the conditions.

missing <- which(is.na(df$price))
for (i in 1:36){
 x <- df[missing,]group
 y <- df[missing,]year
 selection <- df[conditions??,]$price
 df[missing,]$price <- mean(selection, na.rm = TRUE)
}
LuizZ
  • 945
  • 2
  • 11
  • 23
Riki LC
  • 15
  • 4
  • Welcome to Stack Overflow. Please provide a reproducible example so other users can help you: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – LuizZ Nov 26 '20 at 19:04

2 Answers2

2

You don't need a for loop, you can directly replace all the NAs with the mean(, na.rm=T) directly to calculate the mean of said column without NAs. This is for the general case:

df[is.na(df$price),]$price <- mean(df$price, na.rm = TRUE)

Using tidyverse you can achieve what you want:

library(tidyverse)
df %>% group_by(group, year) %>% mutate(price=ifelse(is.na(price), mean(price, na.rm=T), price))

Using data.table

dt <- data.table(df)
dt[,price:=fifelse(is.na(price), mean(price, na.rm=T), price), by=.(group,year)][]
Abdessabour Mtk
  • 3,895
  • 2
  • 14
  • 21
  • the problem is that i dont need the mean of all values of price. First, i need the mean of the price where group = x and year = x, then the same thing again but with group = y and year = y, thats why i was thinking of a for loop with diferent conditions – Riki LC Nov 26 '20 at 18:27
  • @RikiLC yeah I added a `tidyverse` solution that groups by group and year so the mean is calculated only on each group instead of the full dataset – Abdessabour Mtk Nov 26 '20 at 18:36
1

A base R solution using by, which splits a data frame by the groups in the list in the second argument, and applies a function defined in the third:

result <- by(df, 
             list(df[["group"]], df[["year"]]), 
             function(x) {
               x[is.na(x$price), "price"] <- mean(x[["price"]], na.rm = TRUE)
               x
             }, 
             simplify = TRUE)

do.call(rbind, result)
henryn
  • 1,163
  • 4
  • 15