How to subset out some alphabetically arranged columns from a dataframe?

Question

I have a dataframe where the columns are alphabetically ordered (COL_A, COL_B, COL_C, etc.). How can I subset out columns H to M, without writing out the column names explicitly, or without counting how many columns there are?

Edit for clarification: I don't mean to say that the columns are exactly COL_A, COL_B and so on, merely that they are alphabetically arranged with any common structure in the name-strings. They can be, for example, Alabama, Arkansas, Texas, Wyoming, and Zambia. In an nutshell, I am trying to find an alternative to df_subset = df[,n1:n2] where I can directly put in the column names instead of having to count out the column numbers n1 and n2.

score 3 · Answer 1 · answered Mar 05 '12 at 10:06

3

d = as.data.frame(matrix(1:26,nrow=1))
names(d) = paste("COL_", LETTERS, sep="")
grep("[H-M]$", names(d))
d[, grepl("[H-M]$", names(d))]

answered Mar 05 '12 at 10:06

baptiste

75,767
19
198
294

1

Wouldn't it be safer to use `patt="^COL_[H-M]$"`? – IRTFM Mar 05 '12 at 13:27
@DWin probably, but the question is a little vague on the pattern to match – baptiste Mar 05 '12 at 18:41

flodel · Accepted Answer · 2012-03-05T12:36:13.133

A more general approach is to use the >= and <= operators applied to the column names. Here is an example using a data frame where the column names are US states:

> df <- data.frame(as.list(state.abb))
> colnames(df) <- state.name
> df[, 1:3]
  Alabama Alaska Arizona
1      AL     AK      AZ
> df[colnames(df) >= "Florida" & colnames(df) <= "Illinois"]
  Florida Georgia Hawaii Idaho Illinois
1      FL      GA     HI    ID       IL

Another method would be to find the indices of the two boundaries using match and build a sequence between those two:

> df[seq(from = match("Florida", colnames(df)),
+        to   = match("Illinois", colnames(df)))]
  Florida Georgia Hawaii Idaho Illinois
1      FL      GA     HI    ID       IL

I was actually trying to find a alphabet/string version of df[,n1:n2], i.e. I wanted to avoid counting the actual column numbers. Your second option is perfect. Many thanks, flodel. — user702432, Mar 07 '12 at 03:29

score 1 · Answer 3 · edited May 23 '17 at 12:29

1

You can try something like this

dfrm <- data.frame(replicate(26, rnorm(10)))
colnames(dfrm) <- paste("COL", LETTERS, sep="_")
which(substr(colnames(dfrm), 5, 6) %in% LETTERS[3:6])

The last expression returns column number that match the letters C to F. See also match, and this related thread: Get column index from label in a data frame.

edited May 23 '17 at 12:29

Community

1
1

answered Mar 05 '12 at 10:09

chl

27,771
5
51
71

How to subset out some alphabetically arranged columns from a dataframe?

3 Answers3