1

As a newbie I wanted to get better on loops and if...else statemets in R. I am trying to replace NAs using a for loops and if...else instead of ifelse and lapply.However, I couldn't index the data properly in the if... else bit.

Example:

data<-data.frame(a<-c("a","b","c","d"),
                 b<-c("1","2",NA,"5"),
                 c<-c("10",NA,"30",40))

for (i in data){
  for (x in 1:nrow(i)){
    if (x==NA) {
      x<-mean(i,na.rm=T)
    }else
      x<-x
}

I get an error saying "Error in 1:nrow(i) : argument of length 0". Any suggestions ?

A_Koala
  • 11
  • 1
  • 3

2 Answers2

0

To address your error first: as you loop through the data frame, i is a 1D vector (i.e., a column of the data frame) and so nrow doesn't make any sense. To see this, try for(i in data)print(nrow(i)).


You're declaring individual vectors outside a data frame when you use the following syntax:

data<-data.frame(a<-c("a","b","c","d"),
                 b<-c("1","2",NA,"5"),
                 c<-c("10",NA,"30",40))

Just try typing a and you'll see it exists outside the data frame. Also, it means the data frame is defined incorrectly. Check it out:

  a....c..a....b....c....d.. b....c..1....2...NA...5..
1                          a                         1
2                          b                         2
3                          c                      <NA>
4                          d                         5
  c....c..10...NA...30...40.
1                         10
2                       <NA>
3                         30
4                         40

What you actually need is the following:

data <- data.frame(a = c("a","b","c","d"),
                   b = c("1","2",NA,"5"),
                   c = c("10",NA,"30",40))

which gives

  a    b    c
1 a    1   10
2 b    2 <NA>
3 c <NA>   30
4 d    5   40

Also, your braces for the loops don't match up correctly.

If you examine the class of each column in data by running lapply(data, class), you'll see they're all factors. Taking the mean – as you try to do in your code – is therefore meaningless. If columns b and c are meant to be numerics, then you don't need the quotation marks in their definition, like this:

data <- data.frame(a = c("a", "b", "c", "d"),
                   b = c(1, 2, NA, 5),
                   c = c(10, NA, 30 ,40))

If column a was also a numeric, you could achieve your objective with this:

for(i in 1:ncol(data)){
  data[is.na(data[,i]), i] <- mean(data[,i], na.rm = TRUE)
}

from here.

Dan
  • 11,370
  • 4
  • 43
  • 68
-1

When checking for the existence of NAs you have to use the is.na() function, since NAs work just as NULLs in relational databases.

As an ilustration of how it works, you can run the following lines in your R console, and check the outputs:

1 == 1
1 == 2
1 == NA
NA == NA
is.na(NA)

This being said, if what you want is to replace NAs values in your data frame with column means, you can check this previous question.

Narciandi
  • 68
  • 7