-1

I am trying to replace null values based on two columns. Basically, I have company codes in one column and its respective values in the second. I need to replace mean of the values for each of the company code rather than mean of the complete column. How do I do it in R? (Look at the image below)

enter image description here

joran
  • 169,992
  • 32
  • 429
  • 468
Varsha
  • 59
  • 4
  • 2
    What did you try? This is not the way a question is communicated. Check on how to ask a question example. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1 – user5249203 Jan 29 '18 at 21:01

3 Answers3

0

Do you need something like this:

df <- data.frame(Symbol = c("NXCDX", "ALX", "ALX", "BESOQ", "BESOQ", "BESOQ"), 
                Values = c(2345, 8654, NA, 6394, 8549, NA))

df %>% dplyr::group_by(Symbol) %>% dplyr::summarise(mean_values = mean(Values, na.rm = TRUE))
Mislav
  • 1,533
  • 16
  • 37
0

using data.table

library(data.table)
setDT(df)[,replace(Values,is.na(Values),mean(Values,na.rm = T)),by=Symbol]
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

Assuming your data is in a data frame called 'myData' you can go ahead and use the ddply function from the plyr package to generate the mean per company code. The ddply function applies a function to a column(s) grouped by another column(s).

library(plyr)

#Find the entries where the values are NULL, using "" (empty string) as NULL
#Can replace "" with whatever NULL is for you
nullMatches <- myData$Values == ""

#Generate the mean for each company
#This will return a 2 column data frame, first column will be "Symbol".
#Second column will the value of means for each 'Symbol'.
meanPerCompany <- ddply(myData[!nullMatches,], "Symbol", numcolwise(mean))

#Match the company symbol and store the mean
myData$Values[nullMatches] <- meanPerCompany[match(myData$Symbol[nullMatches], meanPerCompany[,1]),2]
LachlanO
  • 1,152
  • 8
  • 14