11

Possible Duplicate:
Extract info inside all parenthesis in R (regex)

I have a string

df

Peoplesoft(id-1290)

I like to capture characters between the parentesis, for example. I like to get id-1290 from the above example.

I used this:

x <- regexpr("\\((.*)\\)", df) 

this is giving me numbers like

[1] 10

Is there an easy way to grab text between parentesis using regex in R?

Community
  • 1
  • 1
user1471980
  • 10,127
  • 48
  • 136
  • 235

2 Answers2

34

I prefer to use gsub() for this:

gsub(".*\\((.*)\\).*", "\\1", df)
[1] "id-1290"

The regex works like this:

  • Find text inside the parentheses - not your real parentheses, but my extra set of parentheses, i.e. (.*)
  • Return this as a back-reference, \\1

In other words, substitute all text in the string with the back reference


If you want to use regexp rather than gsub, then do this:

x <- regexpr("\\((.*)\\)", df)
x

[1] 11
attr(,"match.length")
[1] 9
attr(,"useBytes")
[1] TRUE

This returns a value of 11, i.e. the starting position of the found expression. And note the attribute match.length that indicates how many characters were matched.

You can extract this with attr:

attr(x, "match.length")
[1] 9

And then use substring to extract the characters:

substring(df, x+1, x+attr(x, "match.length")-2)
[1] "id-1290"
Andrie
  • 176,377
  • 47
  • 447
  • 496
5

Here is a slightly different way, using lookbehind/ahead:

df <- "Peoplesoft(id-1290)"
regmatches(df,gregexpr("(?<=\\().*?(?=\\))", df, perl=TRUE))

Difference with Andrie's answer is that this also works to extract multiple strings in brackets. e.g.:

df <- "Peoplesoft(id-1290) blabla (foo)"
regmatches(df,gregexpr("(?<=\\().*?(?=\\))", df, perl=TRUE))

Gives:

[[1]]
[1] "id-1290" "foo" 
Sacha Epskamp
  • 46,463
  • 20
  • 113
  • 131