239

Suppose, you have a data.frame like this:

x <- data.frame(v1=1:20,v2=1:20,v3=1:20,v4=letters[1:20])

How would you select only those columns in x that are numeric?

Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255

12 Answers12

374

EDIT: updated to avoid use of ill-advised sapply.

Since a data frame is a list we can use the list-apply functions:

nums <- unlist(lapply(x, is.numeric), use.names = FALSE)  

Then standard subsetting

x[ , nums]

## don't use sapply, even though it's less code
## nums <- sapply(x, is.numeric)

For a more idiomatic modern R I'd now recommend

x[ , purrr::map_lgl(x, is.numeric)]

Less codey, less reflecting R's particular quirks, and more straightforward, and robust to use on database-back-ended tibbles:

dplyr::select_if(x, is.numeric)

Newer versions of dplyr, also support the following syntax:

x %>% dplyr::select(where(is.numeric))
mdsumner
  • 29,099
  • 6
  • 83
  • 91
  • 12
    `x[nums]` or `x[sapply(x,is.numeric)]` works as well. And they always return `data.frame`. Compare `x[1]` vs `x[,1]` - first is `data.frame`, second is a vector. If one want to prevent conversion then must use `x[, 1, drop=FALSE]` . – Marek May 03 '11 at 11:46
  • Any way to select continuous data only? This method returns continuous as well as integer. – derelict Aug 04 '16 at 19:28
  • When there is no numeric column, the following error arise `undefined columns selected`. How do you avoid it ? – Yohan Obadia Aug 13 '16 at 14:15
  • @SoilSciGuy continuous data should be as.numeric. Perhaps you have factor data that's in numeric form? You should open a new question. – Brandon Bertelsen Sep 02 '16 at 23:38
  • 1
    @YohanObadia You can use a `tryCatch()` to deal with this. Please consider opening a new question. – Brandon Bertelsen Sep 02 '16 at 23:39
  • @SoilSciGuy `x[sapply(x, is.numeric) & ! sapply(x, is.integer)]` – Gregor Thomas Mar 23 '20 at 18:34
  • why if I do that with apply(data,2,is.numeric) it does not work, even if the output is identical to unlist(lapply(x, is.numeric)) @mdsummer – Jenny Jun 16 '20 at 17:47
  • apply is for arrays, not for data frames that can have columns of different types - if you apply() you get all columns the same type, and so some are character and then all are character – mdsumner Jun 17 '20 at 00:07
95

The dplyr package's select_if() function is an elegant solution:

library("dplyr")
select_if(x, is.numeric)
Sharon
  • 3,676
  • 3
  • 23
  • 20
57

Filter() from the base package is the perfect function for that use-case: You simply have to code:

Filter(is.numeric, x)

It is also much faster than select_if():

library(microbenchmark)
microbenchmark(
    dplyr::select_if(mtcars, is.numeric),
    Filter(is.numeric, mtcars)
)

returns (on my computer) a median of 60 microseconds for Filter, and 21 000 microseconds for select_if (350x faster).

Kevin Zarca
  • 2,572
  • 1
  • 18
  • 18
  • This solution doesn't fail when no numeric columns are present. Are there any drawbacks to using it? – bli Nov 22 '16 at 10:10
  • Filter only applies to rows of a dataframe rather than columns. As such, this solution wouldn't give the correct result. – Michael Jan 18 '17 at 11:45
  • 4
    @Michael don't confuse Filter from the base package and filter from dplyr package! – Kevin Zarca Feb 01 '17 at 14:45
  • 1
    @bli I can't see any drawback of using Filter. Its input is a data.frame object and it return a data.frame – Kevin Zarca Feb 01 '17 at 14:47
  • Just chiming in here for reference: what ```Filter()``` doesn't work for here is replacing, e.g. ```Filter(is.numeric,iris) <- 0.5*Filter(is.numeric,iris)``` won't work. – Mobeus Zoom Jun 06 '20 at 07:05
  • This one also works with data.table, thx! – Akantor Nov 18 '22 at 12:05
9

in case you are interested only in column names then use this :

names(dplyr::select_if(train,is.numeric))
user3065757
  • 475
  • 1
  • 5
  • 14
9
iris %>% dplyr::select(where(is.numeric)) #as per most recent updates

Another option with purrr would be to negate discard function:

iris %>% purrr::discard(~!is.numeric(.))

If you want the names of the numeric columns, you can add names or colnames:

iris %>% purrr::discard(~!is.numeric(.)) %>% names
AlexB
  • 3,061
  • 2
  • 17
  • 19
8

This an alternate code to other answers:

x[, sapply(x, class) == "numeric"]

with a data.table

x[, lapply(x, is.numeric) == TRUE, with = FALSE]
Enrique Pérez Herrero
  • 3,699
  • 2
  • 32
  • 33
6
library(purrr)
x <- x %>% keep(is.numeric)
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
Yash Khokale
  • 61
  • 1
  • 1
3

The library PCAmixdata has functon splitmix that splits quantitative(Numerical data) and qualitative (Categorical data) of a given dataframe "YourDataframe" as shown below:

install.packages("PCAmixdata")
library(PCAmixdata)
split <- splitmix(YourDataframe)
X1 <- split$X.quanti(Gives numerical columns in the dataset) 
X2 <- split$X.quali (Gives categorical columns in the dataset)
user1
  • 391
  • 3
  • 27
1

If you have many factor variables, you can use select_if funtion. install the dplyr packages. There are many function that separates data by satisfying a condition. you can set the conditions.

Use like this.

categorical<-select_if(df,is.factor)
str(categorical)
서영재
  • 96
  • 1
  • 9
0

Another way could be as follows:-

#extracting numeric columns from iris datset
(iris[sapply(iris, is.numeric)])
greg-449
  • 109,219
  • 232
  • 102
  • 145
Ayushi
  • 9
  • 1
  • 1
    Hi Ayushi, this probably was downvoted because it's a repeat of the first answer, but this method has some issues that were identified. Take a look at the comments in the first answer, you'll see what I mean. – Brandon Bertelsen Oct 09 '18 at 18:25
0
Numerical_variables <- which(sapply(df, is.numeric))
# then extract column names 
Names <- names(Numerical_variables)
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
-1

This doesn't directly answer the question but can be very useful, especially if you want something like all the numeric columns except for your id column and dependent variable.

numeric_cols <- sapply(dataframe, is.numeric) %>% which %>% 
                   names %>% setdiff(., c("id_variable", "dep_var"))

dataframe %<>% dplyr::mutate_at(numeric_cols, function(x) your_function(x))
RJMCMC
  • 1