2

I would like to use the grepl() function to determine whether a vector of characters is matched with a pattern and based on that pattern concatenate characters within the vector. For example:

vec <- c("a","b","a","c","a","c","a","b") 
grepl("[a]", vec)
TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE

I would like the all of the values following the TRUE to be binded together until the next TRUE so that the outcome of this would be a vector that looks like:

"ab", "ac", "ac", "ab"

Thanks for any thoughts.

coding_heart
  • 1,245
  • 3
  • 25
  • 46

2 Answers2

5

If you are not wedded to grepl():

VEC <- paste(vec, collapse="")                # Collapse into single string ...
strsplit(VEC, "(?<=.)(?=a)", perl=TRUE)[[1]]  # ... then split it before each 'a'
# [1] "ab" "ac" "ac" "ab"
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • (+1) Josh. Brilliant. @SimonO101, for the regexp problem you were facing recently with an answer that was based on Josh's strsplit question, maybe this answer will help? – Arun Mar 26 '13 at 16:46
  • Whoa! You just solved my main gripe about `strsplit`; that it always eats the separator, or at least that's what I thought until I saw this answer. Can you add a note about regex terminology for what reference terms to look up in the Table of Contents or Index? – IRTFM Mar 26 '13 at 17:12
  • @DWin -- That was a long-time gripe of mine as well. Half of the answer is to use "look-around assertions" (`(?<=...)`, `(?<!...)`, `(?=...)` and `(?!...)`), which match at the gaps *between* characters. For more on that see the "Perl-like Regular Expressions" section of `?regex`, and [the diagram here](http://stackoverflow.com/questions/406230/regular-expression-to-match-string-not-containing-a-word). Theodore Lytras and I recently figured the other half out [here](http://stackoverflow.com/questions/15575221/why-does-strsplit-use-positive-lookahead-and-lookbehind-assertion-matches-differ). – Josh O'Brien Mar 26 '13 at 17:43
  • Thanks. Useful discussion. I also thought that the Perl behavior was not as described in `help(regex)`. It seemed in this case that "(?<=a)" should have been enough to split just before the "a"'s – IRTFM Mar 26 '13 at 17:53
  • @DWin -- Exactly. It wasn't until I tried the same regex with `gregexpr()` that I realized, "Oh, duh -- this is an issue specific to `strsplit()`. Theodore Lytras figured out the rest for me ;-) – Josh O'Brien Mar 26 '13 at 17:56
3

Use this:

groups <- cumsum(grepl("[a]", vec))
# > groups
# [1] 1 1 2 2 3 3 4 4
aggregate(vec, by=list(groups=groups), FUN=function(x)paste(x,collapse=""))

#   groups  x
# 1      1 ab
# 2      2 ac
# 3      3 ac
# 4      4 ab
Ferdinand.kraft
  • 12,579
  • 10
  • 47
  • 69