regex in R: Number range after letter at end of string

Question

I have column names that are something like this:

colnames <- c("a81", "b82", "a181", "b182")

I want to extract only columns that have either 81 or 82 at the end and before those numbers (81 and 82) there should be a letter. I want to use regex in grepl and the expected output here is

TRUE TRUE FALSE FALSE

What I tried so far is

grepl("[:alpha:][81-82]$", colnames)

Unfortunately, the code returns FALSE for all entries. What am I doing wrong?

score 2 · Answer 1 · answered Apr 27 '20 at 20:37

2

I'd go with

grepl("[a-z](81|82)$", colnames)

[a-z] means any letter and the parentheses group any of the next 2 characters: i.e. either 81 OR (|) 82.

answered Apr 27 '20 at 20:37

J_Alaniz

98
6

1

Thank you. That works for the example in the question but for my usecase the answer from akrun is more handy because I have a wide number range – Apr 27 '20 at 20:54

score 1 · Accepted Answer · answered Apr 27 '20 at 20:32

1

The [:alpha:] needs to be changed to [[:alpha:]] and instead of [81-82], it can be 8 followed by [1-2]

grepl("^[[:alpha:]]8[1-2]$", colnames)
#[1]  TRUE  TRUE FALSE FALSE

Or we can specify

grepl("^[A-Za-z]8[1-2]$", colnames)

answered Apr 27 '20 at 20:32

akrun

874,273
37
540
662

That works! [This](https://rstudio-pubs-static.s3.amazonaws.com/74603_76cd14d5983f47408fdf0b323550b846.html) site says we can use `[:alpha:]`. Is this wrong or are there situations when this works, too? In case you know that.. – Apr 27 '20 at 20:47
@machine if you check the `?regex`, it says the interpretation depends on the locale – akrun Apr 27 '20 at 20:50
If i used the single `[`, it returns FALSE for the second one – akrun Apr 27 '20 at 20:50
1

@machine you can check [here](https://stackoverflow.com/questions/42013292/posix-character-class-does-not-work-in-base-r-regex) – akrun Apr 27 '20 at 20:52
1

Thank you for that! Also, I was not aware that we can not use `[81-82]` which is same as `[1-8]`. – Apr 27 '20 at 21:00
@machine the range of digits is from 0-9 and it is looking for the digits instead of the numbers – akrun Apr 27 '20 at 21:02

regex in R: Number range after letter at end of string

2 Answers2