0

I tried to do:

apply(test,2,mean)

and I get this warning:

     CS.32   No..of.Takes         CS.130 No..of.Takes.1         CS.131 No..of.Takes.2         CS.133 No..of.Takes.3         CS.135 No..of.Takes.4 
        NA             NA             NA             NA             NA             NA             NA             NA             NA             NA 
Warning messages:
1: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
3: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
4: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
5: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
6: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
7: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
8: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
9: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
10: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA

I want to filter the data set to compute for the means avoiding some non numeric values like NA, INC, DRP, etc.

Neeku
  • 3,646
  • 8
  • 33
  • 43
user3070751
  • 9
  • 3
  • 7

3 Answers3

2

Change your code to

colMeans(test[,sapply(test, is.numeric)], na.rm=TRUE)

I think it'll work.

Note that colMeans(data.frame/matrix) is the same (but better and a bit faster) as apply(data.frame/matrix, 2, mean).

In my code, test[,sapply(test, is.numeric)] tests whether a specific column is numeric, if so, then its colmean is calculated via colMeans, otherwise it is skipped. Therefore sapply(test, is.numeric) is the "filter" you're looking for, it returns a boolean vector (TRUE/FALSE) indicating which column is numeric, you can use it to subset your data.frame/matrix.

See this example, consider iris dataset

> data(iris)
> apply(iris, 2, mean)  # NA's produced as in your case
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
          NA           NA           NA           NA           NA 
Mensajes de aviso perdidos
1: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
...

> apply(iris[, sapply(iris, is.numeric)], 2, mean)  # output is OK
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333 
> colMeans(iris[, sapply(iris, is.numeric)])        # same output
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333 
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • It gave the means of some of the columns. some of the columns have NA,DRP,INC. which was skipped during the computation. but columns with was ignored. the whole column where is present is ignored for computation. how to deal with that? – user3070751 Jan 11 '14 at 17:46
  • `factor` and `character` should be skipped, as `mean` function only accepts `numeric` values, `NA` are also left over, because of `na.rm=TRUE` argument, otherwise the `mean` will return `NA`. Take a look at the documentation and make your problem reproducible. Btw what value for the mean do you expect from a column full of `NA`? – Jilber Urbina Jan 11 '14 at 17:50
0

add the parameter to ignore the NAs and make sure all your columns are numeric. You can check that using str(test)

 apply(test,2,mean,na.rm=TRUE)
crogg01
  • 2,446
  • 15
  • 35
0

alternative method.. step by step

  • b<-apply(test,2,as.numeric)
  • good=complete.cases(b)
  • c=b[good,]
  • apply(c,2,mean)
Kizuna
  • 15
  • 1
  • 6