how to select columns which share specific name in a data frame

Question

I am sorry to ask question which could be very simple. I have a data frame which I try to put the dput below.

mydf <- structure(list(br.Id = c(1992.0001, 1992.0002, 1992.0003, 1992.0004, 
1992.0005, 1992.0006, 1992.0007, 1992.0008, 1992.0009, 1992.001
), si.month = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), br.day = c(23L, 
23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L), br.year = c(1992L, 
1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L
), branch = 1:10, br.location = c(160170L, 160170L, 160170L, 160170L, 
160170L, 160170L, 160170L, 160170L, 160170L, 160170L), si.length = c(90L, 
128L, 112L, 68L, 56L, 58L, 111L, 111L, 115L, 65L), si.weight = c(9.3, 
32.5, 19, 4.4, 2.1, 2.8, 16.1, 17.9, 22.7, 3.4), si.sex = structure(c(2L, 
1L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 1L), .Label = c("female", "male", 
"unknown"), class = "factor"), maturity = structure(c(7L, 7L, 
7L, 7L, 10L, 7L, 7L, 7L, 7L, 2L), .Label = c("developing", "immature", 
"mature", "nearly.ripe", "nearly.spent", "recovering", "ripe", 
"running", "spent", "unknown", "yoy"), class = "factor"), age = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, 1L)), .Names = c("br.Id", "si.month", 
"br.day", "br.year", "branch", "br.location", "si.length", "si.weight", 
"si.sex", "maturity", "age"), row.names = c(NA, 10L), class = "data.frame")

what I am trying to do is to select specific columns which share the same name as br. so the output should look like

      br.Id br.day br.year branch br.location
1  1992.000     23    1992      1      160170
2  1992.000     23    1992      2      160170
3  1992.000     23    1992      3      160170
4  1992.000     23    1992      4      160170
5  1992.001     23    1992      5      160170
6  1992.001     23    1992      6      160170
7  1992.001     23    1992      7      160170
8  1992.001     23    1992      8      160170
9  1992.001     23    1992      9      160170
10 1992.001     23    1992     10      160170

I thought maybe grep can be used to get these columns but I could not figure out how to use it. I thank you for any help

score 5 · Answer 1 · answered Jul 17 '16 at 17:30

You can use this probably ?
One way is to use dplyr package. You need to load that library(dplyr). Then the select function is a known dplyr function used to select variables and then you can use contains to get those columns with specific letters in them out.

mydf %>% select(contains("br"))

#br.Id br.day br.year branch br.location
#1  1992.000     23    1992      1      160170
#2  1992.000     23    1992      2      160170
#3  1992.000     23    1992      3      160170
#4  1992.000     23    1992      4      160170
#5  1992.001     23    1992      5      160170
#6  1992.001     23    1992      6      160170
#7  1992.001     23    1992      7      160170
#8  1992.001     23    1992      8      160170
#9  1992.001     23    1992      9      160170
#10 1992.001     23    1992     10      160170

thanks, I cannot like your answer because I am new but thank you thank you to show me something which I even could not imagine ! — user 127354, Jul 17 '16 at 17:56

score 4 · Accepted Answer · answered Jul 17 '16 at 17:30

4

Use this

mydf[,grep(colnames(mydf),pattern="br.",fixed = TRUE)]

answered Jul 17 '16 at 17:30

user2100721

3,557
2
20
29

thanks it helped me a lot to know how to use the grep for my data – user 127354 Jul 17 '16 at 17:55

989 · Answer 3 · 2016-07-17T17:53:22.057

4

Or using grepl:

mydf[,grepl("br.",colnames(mydf))]

Or using regexpr:

mydf[,regexpr("br.",colnames(mydf))>0]

Or using str_detect from stringr:

library(stringr)
mydf[,str_detect(colnames(mydf),"br.")]

edited Jul 17 '16 at 17:53

answered Jul 17 '16 at 17:39

989

12,579
5
31
53

1

I enjoy the regexpr, I liked your solution ! – Jul 17 '16 at 17:43
@m0h3n thanks, I cannot like your answer because I am new but thank you thank you to show me other ways – user 127354 Jul 17 '16 at 17:55

score 2 · Answer 4 · answered Jul 17 '16 at 18:58

2

mydf[, grep("^br.", names(mydf))]

answered Jul 17 '16 at 18:58

s_baldur

29,441
4
36
69

Note that names is equivalent to colnames: http://stackoverflow.com/questions/24799153/what-is-the-difference-between-names-and-colnames – s_baldur Jul 17 '16 at 18:58
2

`names()` is equivalent to `colnames()` for **data frames** – Rich Scriven Jul 17 '16 at 19:24
@RichardScriven Thanks. – s_baldur Jul 17 '16 at 20:17

how to select columns which share specific name in a data frame

4 Answers4