To address your error first: as you loop through the data frame, i
is a 1D vector (i.e., a column of the data frame) and so nrow
doesn't make any sense. To see this, try for(i in data)print(nrow(i))
.
You're declaring individual vectors outside a data frame when you use the following syntax:
data<-data.frame(a<-c("a","b","c","d"),
b<-c("1","2",NA,"5"),
c<-c("10",NA,"30",40))
Just try typing a
and you'll see it exists outside the data frame. Also, it means the data frame is defined incorrectly. Check it out:
a....c..a....b....c....d.. b....c..1....2...NA...5..
1 a 1
2 b 2
3 c <NA>
4 d 5
c....c..10...NA...30...40.
1 10
2 <NA>
3 30
4 40
What you actually need is the following:
data <- data.frame(a = c("a","b","c","d"),
b = c("1","2",NA,"5"),
c = c("10",NA,"30",40))
which gives
a b c
1 a 1 10
2 b 2 <NA>
3 c <NA> 30
4 d 5 40
Also, your braces for the loops don't match up correctly.
If you examine the class of each column in data
by running lapply(data, class)
, you'll see they're all factor
s. Taking the mean – as you try to do in your code – is therefore meaningless. If columns b
and c
are meant to be numerics, then you don't need the quotation marks in their definition, like this:
data <- data.frame(a = c("a", "b", "c", "d"),
b = c(1, 2, NA, 5),
c = c(10, NA, 30 ,40))
If column a
was also a numeric, you could achieve your objective with this:
for(i in 1:ncol(data)){
data[is.na(data[,i]), i] <- mean(data[,i], na.rm = TRUE)
}
from here.