2

Thanks for the previous posts and professional responses. I can almost do my analysis, except those conditions with NA. Here is my data.frame and code used. Could you mind to teach me how to solve the problem when condition contains NA value?

 df1 <- data.frame(A = c(1,2,4, 5), B=c(1,3,NA,1), C=c(1,1,3, NA), D=c(1,1,2,2))

Using this code, I get df1 as follows:

  A  B  C D
1 1  1  1 1
2 2  3  1 1
3 4 NA  3 2
4 5  1 NA 2

With the helps from Andrie, Sacha Epskamp and Chase (R: get average column A based on a range of values in column B), I got mean values of A when D is between 1 and 3, i.e. 2 in this case, with this code.

mean(df1$A[df1$D>1 & df1$D<3])

I got my answer as 4.5 as expected (averge of 4 and 5 in column A).

However, when I replace column D to column C, which contains NA. My answer could only be NA. while I was expecting to see the answer to be average 1 and 2, by neglecting the 3rd row (larger than 2) and the 4th row (with NA) in column C.

mean(df1$A[df1$C>0 & df1$C<2])

R> NA (i expect the count to be 1.5)

I know na.omit can remove all rows with na in any entries in df1. However, I prefer not to do so, as I would also like to get the mean and counts for every columns, when one columns' entry is NA. (e.g. I also want to do mean(df1$A, [is.na(df1$C)]) analysis.

I also tried to test using na.rm=T in the condition part, but R did not recognize it, as now the NA is in the condition part. For instance:

mean(df1$A[df1$C>0 & df1$C<2, na.rm=T])

Error in df1$A[df1$C > 0 & df1$C < 2, na.rm = T] :
  incorrect number of dimensions

I believe there are smarter way of doing this. Pls kindly advice.

Community
  • 1
  • 1
a83
  • 21
  • 1
  • 1
  • 2
  • possible duplicate of [R script - removing NA values from a vector](http://stackoverflow.com/questions/7706876/r-script-removing-na-values-from-a-vector) – Waldir Leoncio Dec 17 '13 at 19:12

1 Answers1

12

The reason why you were getting an error stating incorrect number of dimensions was because the na.rm=TRUE was inside the square brackets. Thus, R was interpreting this as being the 3rd dimension of an object such as a dataframe, matrix, etc. If the na.rm=TRUE is placed outside, it works fine.

mean(df1$A[df1$C>0 & df1$C<2],na.rm=TRUE)
[1] 1.5
53RT
  • 649
  • 3
  • 20
  • Hi, I am going to do the same thing but I need to find the mean of every 10 rows of one column of my data (which has 1000 rows and some NA data) how should I do it?Can you please guide me?Thanks :) – Shalen May 07 '20 at 17:49
  • Hey Shalen, I recommend you open a new question for this. If I understand correctly, your problem might be solved by creating a second column which groups the first column into blocks of ten rows (--> in the second column the first ten rows all get the value "1", rows 11-20 get the value "2" and so on). Then you use dplyr's group_by(second_column) and summarise(mean_of_blocks = mean(first_column, na.rm)) – Torakoro Sep 08 '20 at 13:43