2

I am trying to find skewness of my numeric columns in a data frame. The apply function used in the below code is returning NULL. However, when I use the function directly to any of the column it returns values.

library(mlbench)
data(Glass)
funNum= function(x){
  if(is.numeric(x)){return(skewness(x))
  }
}

funNum(Glass$Na)
# [1] 0.4478343

apply(Glass,2,funNum)
# NULL

Please suggest what is wrong in the above code. Thanks in advance!

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
Apoorv
  • 177
  • 1
  • 3
  • 15
  • What does `apply(a,2,is.numeric)` give you? – Carl Jul 06 '16 at 17:27
  • `apply(a,2,is.numeric) RI Na Mg Al Si K Ca Ba Fe Type FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ` – Apoorv Jul 06 '16 at 17:28
  • Please provide a [minimal](http://stackoverflow.com/help/mcve) and [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example. After skimming at least one of those links, please provide sample data and code used (such as `skewness`). Please edit your question for this instead of adding comments, thanks! – r2evans Jul 06 '16 at 17:30
  • @ZheyuanLi - I just applied sapply , it gives the result. However ,i would also like to understand the reason for apply not working here .any probable explanation? – Apoorv Jul 06 '16 at 17:30
  • I have edited the code in the question to make it reproducible . – Apoorv Jul 06 '16 at 17:34

2 Answers2

2

Yeah, my guess in the comment is right: you have factors!

sapply(Glass, class)
#       RI        Na        Mg        Al        Si         K        Ca        Ba 
# "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" 
#       Fe      Type 
# "numeric"  "factor" 

When you use apply(), it will first coerce Glass into a matrix. A matrix, like a vector, can only hold one type of data. Now, your data frame has both numeric and factor, the resulting matrix will be character only. skewness() will have nothing to do in this case, as none of the columns is numeric (so you got NULL).

If you use sapply() or lapply(), things are different. These functions are designed to work with lists / data frames. You will get valid result for all numeric columns.

Whether to use sapply() or lapply() depends on what you want. sapply() returns a vector / matrix whenever it can, while lapply() returns a list (by default). I reckoned that skewness() only returns a scalar result, so recommended using sapply(), by which you end up with a vector. If you want a data frame, use as.data.frame(lapply(Glass, skewness)) instead.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • He could also do `apply(a[,1:9], 2, funNum)`, which would side-step the down-conversion by omitting the `factor` column. – r2evans Jul 06 '16 at 17:43
  • Agreed. Unfortunately, both return a `matrix` and not a data.frame. `lapply` may be a better fit since it at least is closer to a `data.frame`, or a `data.frame`-specific function such as those provided by `plyr` or `dplyr`, but those weren't requested. – r2evans Jul 06 '16 at 17:47
0

What happens is apply coerces to a matrix and converts the numeric columns to character so your function returns NULL

Try

sapply(a,funNum)

This will loop over columns of a without coercing to matrix

Carl
  • 5,569
  • 6
  • 39
  • 74
  • It does not coerces when all my columns are of numeric type and returns correct result. – Apoorv Jul 06 '16 at 17:42
  • That's because it is coercing to a numeric matrix in the case where you have a DF with all numerics – Carl Jul 06 '16 at 18:06