Pattern matching using a wildcard

Question

How do I identify a string using a wildcard?

I've found glob2rx, but I don't quite understand how to use it. I tried using the following code to pick the rows of the data frame that begin with the word blue:

# make data frame
a <- data.frame( x =  c('red','blue1','blue2', 'red2'))

# 1
result <- subset(a, x == glob2rx("blue*") )

# 2
test = ls(pattern = glob2rx("blue*"))
result2 <- subset(a, x == test )

# 3
result3 <- subset(a, x == pattern("blue*") )

However, neither of these worked. I'm not sure if I should be using a different function to try and do this.

IRTFM · Accepted Answer · 2014-08-26T19:37:24.123

If you want to examine elements inside a dataframe you should not be using ls() which only looks at the names of objects in the current workspace (or if used inside a function in the current environment). Rownames or elements inside such objects are not visible to ls() (unless of course you add an environment argument to the ls(.)-call). Try using grep() which is the workhorse function for pattern matching of character vectors:

result <- a[ grep("blue", a$x) , ]  # Note need to use `a$` to get at the `x`

If you want to use subset then consider the closely related function grepl() which returns a vector of logicals can be used in the subset argument:

subset(a, grepl("blue", a$x))
      x
2 blue1
3 blue2

Edit: Adding one "proper" use of glob2rx within subset():

result <- subset(a,  grepl(glob2rx("blue*") , x) )
result
      x
2 blue1
3 blue2

I don't think I actually understood glob2rx until I came back to this question. (I did understand the scoping issues that were ar the root of the questioner's difficulties. Anybody reading this should now scroll down to Gavin's answer and upvote it.)

and if you want to exclude vectors than you can put the "-" operator before "grep" — Thieme Hennis, Dec 23 '13 at 09:52
@BondedDust and what if I want only the elements which begin with "blue", and not the elements which have "blue" within?. E.g I would like select only "blue1" and "blue2" from this data.frame `a <- data.frame( x = c('red','blue1','blue2', 'red2','lightblue','darkblue'))` I will be very grateful with your help. — Darwin PC, Sep 10 '15 at 23:42
The pattern for begins-with-"blue" would be `"^blue"`. See `?regex` — IRTFM, Sep 11 '15 at 00:19

Gavin Simpson · Answer 2 · 2011-04-28T20:14:10.070

glob2rx() converts a pattern including a wildcard into the equivalent regular expression. You then need to pass this regular expression onto one of R's pattern matching tools.

If you want to match "blue*" where * has the usual wildcard, not regular expression, meaning we use glob2rx() to convert the wildcard pattern into a useful regular expression:

> glob2rx("blue*")
[1] "^blue"

The returned object is a regular expression.

Given your data:

x <- c('red','blue1','blue2', 'red2')

we can pattern match using grep() or similar tools:

> grx <- glob2rx("blue*")
> grep(grx, x)
[1] 2 3
> grep(grx, x, value = TRUE)
[1] "blue1" "blue2"
> grepl(grx, x)
[1] FALSE  TRUE  TRUE FALSE

As for the selecting rows problem you posted

> a <- data.frame(x =  c('red','blue1','blue2', 'red2'))
> with(a, a[grepl(grx, x), ])
[1] blue1 blue2
Levels: blue1 blue2 red red2
> with(a, a[grep(grx, x), ])
[1] blue1 blue2
Levels: blue1 blue2 red red2

or via subset():

> with(a, subset(a, subset = grepl(grx, x)))
      x
2 blue1
3 blue2

Hope that explains what grob2rx() does and how to use it?

It does. And I'm happy to learn something new even at this late date. I don't think I ever understood what a high-level function glob2rx was. — IRTFM, Aug 26 '14 at 19:30

score 4 · Answer 3 · edited Apr 28 '11 at 19:42

4

You're on the right track - the keyword you should be googling is Regular Expressions. R does support them in a more direct way than this using grep() and a few other alternatives.

Here's a detailed discussion: http://www.regular-expressions.info/rlanguage.html

edited Apr 28 '11 at 19:42

Gavin Simpson

170,508
25
396
453

answered Apr 28 '11 at 19:02

Brian MacKay

31,133
17
86
125

score 2 · Answer 4 · answered Nov 06 '13 at 21:35

2

If you really do want to use wildcards to identify specific variables, then you can use a combination of ls() and grep() as follows:

l = ls()
vars.with.result <- l[grep("result", l)]

answered Nov 06 '13 at 21:35

Brian D

2,570
1
24
43

score 2 · Answer 5 · edited May 23 '17 at 12:32

2

You can also use package data.table and it's Like function, details given below How to select R data.table rows based on substring match (a la SQL like)

edited May 23 '17 at 12:32

Community

1
1

answered Jun 09 '15 at 15:00

usct01

838
7
18

score 0 · Answer 6 · answered Jul 03 '21 at 08:13

Another way to achieve your desired functionality would be through dplyr()

filter(str_detect(a, "blue"))

It would consider all instances of blue such as blue1 and blue2

This command is basically the same as

filter(str_detect(a, "blue") == TRUE)

In case, the blues were in upper and lower case, you could do something like:

filter(str_detect(str_to_lower(a), "blue"))

I hope it helps someone who is looking for similar solutions.

Pattern matching using a wildcard

6 Answers6

Linked

Related