R Subset using first and last column names of interest

Question

> df
  a b c  d  e
1 1 4 7 10 13
2 2 5 8 11 14
3 3 6 9 12 15

To subset the columns b,c,d we can use df[,2:4] or df[,c("b", "c", "d")]. However, I am looking for a solution which fetches me the columns b,c,d using something like df[,b:d]. In other words, I want to simply use the first and last column names of interest to subset the data. I have been looking for a solution to this but am unsuccessful. All the examples I have seen till date refer to each and every specific column name while subsetting.

Yes. I think the question is very similar. However, in my case, I do not have the names of the intermediate columns. In the original post, the person seems to have the intermediate column names as well (based on his choice of answer). Having said that, this original post did not turn up in my research. I see that the second solution provided in the original post is very similar to the solution given here. Given this context, I would leave the decision to mark this post as a duplicate to your wise judgement. — dataanalyst, Jul 18 '16 at 22:40
It's debatable, but the overlap seems close enough to me. As Stackoverflow expands rapidly, there are a lot of questions that cover old ground unintentionally. Duplicating it doesn't bury the question, but I think there is value in linking the two explicitly. — thelatemail, Jul 18 '16 at 22:43
As I already mentioned, I agree with your judgement since you might have already seen several such instances. I would do my best to avoid such duplicates in future. — dataanalyst, Jul 18 '16 at 22:45

score 3 · Accepted Answer · answered Jul 18 '16 at 22:17

3

It's also simple in base R, e.g.:

subset(df, select=b:d)

Or roll your own:

df[do.call(seq, as.list(match(c("b","d"), names(df))) )]

answered Jul 18 '16 at 22:17

thelatemail

91,185
12
128
188

Can you please let me know how this syntax should be modified if I am dealing with a data.table instead of a data frame? – dataanalyst Jul 19 '16 at 01:26
@Gandalf - `subset` works on a `data.table` too. Also, you can just do `df[, b:d, with=FALSE]` or `df[, .SD,.SDcols=b:d]` – thelatemail Jul 19 '16 at 01:33
@Gandalf - I don't follow. It is literally impossible for each row in a data.table or data.frame to have a different number of columns. They are always rectangular. – thelatemail Jul 19 '16 at 02:20
Thanks for that solution. It works perfect on a small example data table. However, it still does not solve my issue. One caveat in my case is, each row in the subset data could have a differing number of columns. The `b` and `d` are not fixed across all the rows resulting in an uneven number of columns. My subset function looks something like this. `df[, paste("b", df$var1, sep=""):paste("d", df$var2, sep=""), with=FALSE]`. The var1 and var2 differ across the rows which results in the uneven number of columns. Could this be the culprit in my case? – dataanalyst Jul 19 '16 at 02:21
I tried to edit the question. Does it make sense now? What I am trying to say is, the subsetted data will have uneven number of columns (assuming everything works fine); not the original data table. – dataanalyst Jul 19 '16 at 02:23
@Gandalf - I don't think you can use vectors on each side of the `:` to select columns like that. `df[, c("b","d"):c("c","e"), with=FALSE]` for instance throws an error. – thelatemail Jul 19 '16 at 02:28
If that is the case, I am stuck again. I need to think of tackling this issue in a different way. arghhh... Thanks for your help, though. – dataanalyst Jul 19 '16 at 02:39
I will accept this answer as I am inherently biased towards answers which use base R as opposed to packages. – dataanalyst Jul 22 '16 at 15:42

score 1 · Answer 2 · answered Jul 18 '16 at 22:15

1

If you are open to using dplyr:

dplyr::select(df, b:d)

  b c  d
1 4 7 10
2 5 8 11
3 6 9 12

answered Jul 18 '16 at 22:15

Sumedh

4,835
2
17
32

R Subset using first and last column names of interest

2 Answers2