0

I have column names that are something like this:

colnames <- c("a81", "b82", "a181", "b182")

I want to extract only columns that have either 81 or 82 at the end and before those numbers (81 and 82) there should be a letter. I want to use regex in grepl and the expected output here is

TRUE TRUE FALSE FALSE

What I tried so far is

grepl("[:alpha:][81-82]$", colnames)

Unfortunately, the code returns FALSE for all entries. What am I doing wrong?

2 Answers2

2

I'd go with

grepl("[a-z](81|82)$", colnames)

[a-z] means any letter and the parentheses group any of the next 2 characters: i.e. either 81 OR (|) 82.

J_Alaniz
  • 98
  • 6
  • 1
    Thank you. That works for the example in the question but for my usecase the answer from akrun is more handy because I have a wide number range –  Apr 27 '20 at 20:54
1

The [:alpha:] needs to be changed to [[:alpha:]] and instead of [81-82], it can be 8 followed by [1-2]

grepl("^[[:alpha:]]8[1-2]$", colnames)
#[1]  TRUE  TRUE FALSE FALSE

Or we can specify

grepl("^[A-Za-z]8[1-2]$", colnames)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • That works! [This](https://rstudio-pubs-static.s3.amazonaws.com/74603_76cd14d5983f47408fdf0b323550b846.html) site says we can use `[:alpha:]`. Is this wrong or are there situations when this works, too? In case you know that.. –  Apr 27 '20 at 20:47
  • @machine if you check the `?regex`, it says the interpretation depends on the locale – akrun Apr 27 '20 at 20:50
  • If i used the single `[`, it returns FALSE for the second one – akrun Apr 27 '20 at 20:50
  • 1
    @machine you can check [here](https://stackoverflow.com/questions/42013292/posix-character-class-does-not-work-in-base-r-regex) – akrun Apr 27 '20 at 20:52
  • 1
    Thank you for that! Also, I was not aware that we can not use `[81-82]` which is same as `[1-8]`. –  Apr 27 '20 at 21:00
  • @machine the range of digits is from 0-9 and it is looking for the digits instead of the numbers – akrun Apr 27 '20 at 21:02