90

Say I have a data.frame:

df <- data.frame(A=c(10,20,30),B=c(11,22,33), C=c(111,222,333))
  A  B  C
1 10 11 111
2 20 22 222
3 30 33 333

If I select two (or more) columns I get a data.frame:

x <- df[,1:2]
   A  B
 1 10 11
 2 20 22
 3 30 33

This is what I want. However, if I select only one column I get a numeric vector:

x <- df[,1]
[1] 1 2 3

I have tried to use as.data.frame(), which does not change the results for two or more columns. it does return a data.frame in the case of one column, but does not retain the column name:

x <- as.data.frame(df[,1])
     df[, 1]
1       1
2       2
3       3

I don't understand why it behaves like this. In my mind it should not make a difference if I extract one or two or ten columns. IT should either always return a vector (or matrix) or always return a data.frame (with the correct names). what am I missing? thanks!

Note: This is not a duplicate of the question about matrices, as matrix and data.frame are fundamentally different data types in R, and can work differently with dplyr. There are several answers that work with data.frame but not matrix.

zx8754
  • 52,746
  • 12
  • 114
  • 209
point618
  • 1,309
  • 2
  • 10
  • 23
  • This is not a duplicate, as matrix and data.frame can work differently with dplyr. – qwr Jan 23 '19 at 00:49
  • 1
    For data.frame, the tidy way with dplyr:select: `mtcars %>% dplyr::select("wt")` – qwr Feb 06 '19 at 19:50

3 Answers3

128

Use drop=FALSE

> x <- df[,1, drop=FALSE]
> x
   A
1 10
2 20
3 30

From the documentation (see ?"[") you can find:

If drop=TRUE the result is coerced to the lowest possible dimension.

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
37

Omit the ,:

x <- df[1]

   A
1 10
2 20
3 30

From the help page of ?"[":

Indexing by [ is similar to atomic vectors and selects a list of the specified element(s).

A data frame is a list. The columns are its elements.

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
1

You can also use subset:

subset(df, select = 1) # by index
subset(df, select = A) # by name

As mentioned in the comments you can also use dplyr::select, but you do not need to quote the variable name:

library(dplyr)

# by name
df %>% 
  select(A)

# by index
df %>% 
  select(1)
LMc
  • 12,577
  • 3
  • 31
  • 43