1

I have a data set with Air Quality Data. The Data Frame is a matrix of 153 rows and 5 columns. I want to find the mean of the first column in this Data Frame. There are missing values in the column, so I want to exclude those while finding the mean. And finally I want to do that using Control Structures (for loops and if-else loops)

I have tried writing code as seen below. I have created 'y' instead of the actual Air Quality data set to have a reproducible example.

y <- c(1,2,3,NA,5,6,NA,NA,9,10,11,NA,13,NA,15)
x <- matrix(y,nrow=15)

for(i in 1:15){
   if(is.na(data.frame[i,1]) == FALSE){
   New.Vec <- c(x[i,1])
   }
}
print(mean(New.Vec))

I expected the output to be the mean. Though the error I received is this:

Error: object 'New.Vec' not found

  • It's easier to help with a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). For one thing, you don't need to increment `i`, that's handled by the for loop. For another, it's unclear what you're doing with the `<- FALSE` part, since there isn't any condition being tested. Maybe you mean `==`? I have a feeling the line that assigns `New.Vec` isn't actually getting evaluated, but can't say for sure without being able to run your code. – camille Sep 20 '19 at 18:11
  • @camille - Thank you! The pointers help a lot. I have removed the increment i and added the == . However the error New.Vec still exists. I am editing the question to a reproducible example. So you can check it out in a bit and give your inputs :) – Ashreet Sangotra Sep 20 '19 at 18:36
  • Now you should be getting an error because you're trying to subset `data.frame` instead of `x`. It also seems like you're just reassigning `New.Vec` each iteration...Either way, in R a loop for something like this should be necessary – camille Sep 20 '19 at 18:51

3 Answers3

3

One line of code, no need for for loop.

mean(data.frame$name_of_the_first_column, na.rm = TRUE)

Setting na.rm = TRUE makes the mean function ignore NAs.

Ben G
  • 4,148
  • 2
  • 22
  • 42
2

Here, we can make use of na.aggregate from zoo

library(zoo)
df1[] <- na.aggregate(df1)

Assuming that 'df1' is a data.frame with all numeric columns and wanted to fill the NA elements with the corresponding mean of that column. na.aggregate, by default have the fun.aggregate as mean

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you for the help! However when I call mean(df1), it returns 'NA' and produces an error saying "argument is not numeric or logical: returning NA", Edited Code that I ran is: df1 <- read.csv("hw1_data.csv") df1[] <- na.aggregate[df1) mean(df1[1]) – Ashreet Sangotra Sep 20 '19 at 17:51
  • @AshreetSangotra. The error is very specific. As I mentioned in the post, I assume that your columns are numeric. If your columnss are not numeric, you may need. to check. why it is not – akrun Sep 20 '19 at 18:01
  • The columns are numeric, with the exception of column names. It produces result for colMeans(). You can find the data file in this comment if that would provide any assistance. – Ashreet Sangotra Sep 20 '19 at 18:14
  • @AshreetSangotra. Can you use `dput` of few rows to show the example and post it in your question – akrun Sep 20 '19 at 18:16
  • 'dput(df1[1:5,])' Output is: 'structure(list(Ozone = c(41, 36, 12, 18, 42.1293103448276), Solar.R = c(190, 118, 149, 313, 185.931506849315), Wind = c(7.4, 8, 12.6, 11.5, 14.3), Temp = c(67, 72, 74, 62, 56), Month = c(5, 5, 5, 5, 5), Day = c(1, 2, 3, 4, 5)), row.names = c(NA, 5L), class = "data.frame")' – Ashreet Sangotra Sep 20 '19 at 18:57
  • @AshreetSangotra. For me, the `na.aggregate(df1)` is working fine with your `dput`. All of them are numeric and doesn't have any `NA – akrun Sep 20 '19 at 18:58
  • @AshreetSangotra. I understand your error. You are calling `mean(df1)`. mean expects a vector as input and not a data.frame as there is no method for that. You may neeed `mean(unlist(df1))` assuming all are numeric and wants a single mean of all the elements. If it is column wise, `colMeans(df1, na.rm = TRUE)` – akrun Sep 20 '19 at 18:59
1

can't see your data, but probably like this? the vector needed to be initialized. better to avoid loops in R when you can...

myDataFrame <- read.csv("hw1_data.csv")

New.Vec <- c()    
for(i in 1:153){
   if(!is.na(myDataFrame[i,1])){
      New.Vec <- c(New.Vec, myDataFrame[i,1])
   }
}
print(mean(New.Vec))
David Pedack
  • 482
  • 2
  • 10