1

I've a simple question which makes me weird this morning. How can I get a vector showing the classes of columns of a dataframe?

When I compute this I thought it should work, but it doesn't. I were even more surprised of the result and didn't understand the result.

Here is my example:

Example = data.frame(
             Col1 = c(2,5,10), 
             Col2 = c("Hello", "I am a", "Factor"), 
             Col3 = c(T,F,T))
str(Example)
# 'data.frame': 3 obs. of  3 variables:
# $ Col1: num  2 5 10
# $ Col2: Factor w/ 3 levels "Factor","Hello",..: 2 3 1
# $ Col3: logi  TRUE FALSE TRUE

So I have got a data frame with one numeric column, one factor column and one logical column and the result of the class() command in the apply function is character. Can anybody explain me why and how I can get a vector of the classes?

apply(Example, 2, class)
#       Col1        Col2        Col3 
# "character" "character" "character" 
T. Beige
  • 177
  • 12
  • Okay, thanks sapply works. I'm just still wondering where the character expression come from in this apply call. But for the moment, it works. – T. Beige Oct 25 '18 at 07:15
  • I turned my comment into an answer as I might have a clue why this is the case - approved by a new answer. – alex_555 Oct 25 '18 at 07:16

3 Answers3

6

apply doesn't work for you because, as in the docs:

 If ‘X’ is not an array but an object of a class with a non-null
 ‘dim’ value (such as a data frame), ‘apply’ attempts to coerce it
 to an array via ‘as.matrix’ if it is two-dimensional (e.g., a data
 frame) or via ‘as.array’.

so your data frame becomes a matrix with the column classes set to the simplest possible class that can represent your columns - in this case a character matrix:

> as.matrix(Example)
     Col1 Col2     Col3   
[1,] " 2" "Hello"  " TRUE"
[2,] " 5" "I am a" "FALSE"
[3,] "10" "Factor" " TRUE"

Use sapply

> sapply(Example,class)
     Col1      Col2      Col3 
"numeric"  "factor" "logical" 
Spacedman
  • 92,590
  • 12
  • 140
  • 224
1

I cannot actually explain it in detail, but you might try sapply(Example, class) to get the correct vector you're looking for. sapply is for dataframes, which is why it works. You can also use lapply(Example, class), but you'll have to convert the list you get into a vector. This works too because most basically a dataframe is just a list of dataframes.

apply does not work because it's meant to be used on matrices. And as matrices are always of data from the same type, apply has to give you "character" as an answer. This is because once you have a single character in a given matrix (or a given vector), every number is also converted to character. This is the reason why apply won't work.

alex_555
  • 1,092
  • 1
  • 14
  • 27
1

You can try to loop for each column of you data frame

class.vec <- c()
for(c in colnames(Example)){
  class.vec <- c(class.vec, class(Example[[c]]))
}
class.vec

This will return:

> class.vec
[1] "numeric" "factor"  "logical"

A more "elegant" way is by using sapply:

class.vec <- sapply(Example, class)     
Col1      Col2      Col3 
    "numeric"  "factor" "logical" 
gc_
  • 111
  • 5