Select multiple columns with dplyr::select() with numbers as names

Question

Let's say I have the following data frame:

a <- runif(10)
dd <- as.data.frame(t(a))
names(dd) <- c("ID", "a", "a2", "b", "b2", "f", "XXX", "1", "4", "8")

In dplyr, there is a nice way to select a number of columns. For example, to select the columns between column a and column f, I can use

dd %>% dplyr::select(a:f)

In my problem, the columns of the last part of the data frame may vary, yet they always have as name a number between 1 and 99. However, I can not seem to be able to do the same trick as above:

> dd %>% select(1:99)
Error: Position must be between 0 and n
> dd %>% select("1":"99")
Error: Position must be between 0 and n

Which is because using select() tries to select columns by position in this way.

I would like to be able to obtain a data frame with all columns between a and f, and those with labels that are numbers between 1 and 99. Is that possible to do in one go with select()?

score 17 · Accepted Answer · edited Jan 12 '22 at 12:51

17

Column names starting with a number, such as "1" and "8" in your data, are not syntactically valid names (see ?make.names). Then see the 'Names and Identifiers' section in ?Quotes: "other [syntactically invalid] names can be used provided they are quoted. The preferred quote is the backtick".

Thus, wrap the invalid column names in backticks (`):

dd %>% dplyr::select(a:f, `1`:`8`)

#           a        a2         b        b2          f         1         4         8
# 1 0.2510023 0.4109819 0.6787226 0.4974859 0.01828614 0.7449878 0.1648462 0.5875638

Another option is to use the SE-version of select, select_:

dd %>% dplyr::select_(.dots = c("a", "a2", ..., "1", "4", "8"))

edited Jan 12 '22 at 12:51

micstr

5,080
8
48
76

answered Jun 29 '16 at 08:00

AlexR

2,412
16
26

is there a way of having something like `1`:`99`, even if column 99 is not in this particular data set? – Theodor Jun 29 '16 at 08:30
@Theodor Not directly, but using the function `select_` you can pass it an array of column names, so you can do something like `select_(.dots = colnames(dd)[colnames(dd) %in% as.character(1:99)])` as a workaround – AlexR Jun 29 '16 at 08:32

zx8754 · Answer 2 · 2016-06-29T08:23:55.060

6

We can select columns a:f, and add index of numeric columns by converting colnames to numeric:

dd %>% 
  select(a:f, which(!is.na(as.numeric(colnames(dd)))))

edited Jun 29 '16 at 08:23

answered Jun 29 '16 at 08:18

zx8754

52,746
12
114
209

Select multiple columns with dplyr::select() with numbers as names

2 Answers2

Linked

Related