1

I have a dataset, named diamonds. It has ten variables: carat, cut, color, clarity... So, how can I write a code to find out which variables are categorical variables. I now using class() function to find out the type of each variable, but how can I let my program print it automatically?

classVariables = sapply(diamonds, function(x) class(x))
neilfws
  • 32,751
  • 5
  • 50
  • 63
jiawei li
  • 11
  • 1
  • 2
  • 3
  • Hang on, you've already solved your problem. Your code is perfectly fine to find out the class of each column. So what's the question? – thelatemail Sep 14 '17 at 03:36
  • 3
    To find out categorical variables in the dataset, maybe, `names(which(sapply(diamonds, class) == "factor"))` – Ronak Shah Sep 14 '17 at 03:38
  • Yup, maybe this one is more accurate [R sapply is.factor](https://stackoverflow.com/questions/19169051/r-sapply-is-factor). Also [Selecting only numeric columns from a data frame](https://stackoverflow.com/questions/5863097/selecting-only-numeric-columns-from-a-data-frame) but for factors. – Ronak Shah Sep 14 '17 at 03:50

2 Answers2

4
> str(diamonds)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   53940 obs. of  10 variables:
 $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
 $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
 $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
 $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
 $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
 $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
 $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
> ?str
JustCurious
  • 328
  • 2
  • 10
1
sapply(colnames(diamonds), function(x) class(diamonds[[x]]))

$carat
[1] "numeric"

$cut
[1] "ordered" "factor" 

$color
[1] "ordered" "factor" 

$clarity
[1] "ordered" "factor" 

$depth
[1] "numeric"

$table
[1] "numeric"

$price
[1] "integer"

$x
[1] "numeric"

$y
[1] "numeric"

$z
[1] "numeric"
neilfws
  • 32,751
  • 5
  • 50
  • 63