I am trying to replace null values based on two columns. Basically, I have company codes in one column and its respective values in the second. I need to replace mean of the values for each of the company code rather than mean of the complete column. How do I do it in R? (Look at the image below)
Asked
Active
Viewed 199 times
3 Answers
0
Do you need something like this:
df <- data.frame(Symbol = c("NXCDX", "ALX", "ALX", "BESOQ", "BESOQ", "BESOQ"),
Values = c(2345, 8654, NA, 6394, 8549, NA))
df %>% dplyr::group_by(Symbol) %>% dplyr::summarise(mean_values = mean(Values, na.rm = TRUE))

Mislav
- 1,533
- 16
- 37
0
using data.table
library(data.table)
setDT(df)[,replace(Values,is.na(Values),mean(Values,na.rm = T)),by=Symbol]

Onyambu
- 67,392
- 3
- 24
- 53
0
Assuming your data is in a data frame called 'myData' you can go ahead and use the ddply
function from the plyr
package to generate the mean per company code. The ddply
function applies a function to a column(s) grouped by another column(s).
library(plyr)
#Find the entries where the values are NULL, using "" (empty string) as NULL
#Can replace "" with whatever NULL is for you
nullMatches <- myData$Values == ""
#Generate the mean for each company
#This will return a 2 column data frame, first column will be "Symbol".
#Second column will the value of means for each 'Symbol'.
meanPerCompany <- ddply(myData[!nullMatches,], "Symbol", numcolwise(mean))
#Match the company symbol and store the mean
myData$Values[nullMatches] <- meanPerCompany[match(myData$Symbol[nullMatches], meanPerCompany[,1]),2]

LachlanO
- 1,152
- 8
- 14