Regex Matching Negative values

Question

I'm trying to create some simple and easy to write content-clusters with multiple regexes.

Imagine a list of strings: c("a","b","ac") The groups I need to define are "All: a's" and "All: b's". So the values "a" and "ac" are "A" and "b" is "B".

myDF$contentGroup <- sub(".*a.*", "A", myDF$stringList)

However this will result in a column within my dataframe "contentGroup" which contains the value of "stringList" if no match occured. So if I do the same line of code with "B" it will overwrite the "A"s.

myDF$contentGroup <- sub(".*b.*", "B", myDF$stringList)

I just cant figure out how to do simple clustering in a single line of code. Making it as simple as possible.

`grepl()` shall do the job. `x <- c("a","b","ac"); x[grepl("a|A",x)]` is this you want? — joel.wilson, Dec 15 '16 at 10:53
`x[grep('a', x, fixed = TRUE)] <- 'A'; x[grep('b', x, fixed = TRUE)] <- 'B'` — Sotos, Dec 15 '16 at 10:53
^ Don't forget to add `fixed = TRUE` in those statmentsfor ~X10 boost in performance — David Arenburg, Dec 15 '16 at 10:55
@joel.wilson [here you go](http://stackoverflow.com/questions/19458724/how-do-i-speed-up-text-searches-in-r) - Short answer, it disables regex parsing — Sotos, Dec 15 '16 at 10:59
@Sotos thanks, I have been through thatanswer, but i never understood 'WHY' fixed=TRUE makes it faster. Could you add a brief reasoning on that please? Thanks — joel.wilson, Dec 15 '16 at 11:02
@joel.wilson Because parsing regex is much more complicated than finding exact matches. The rule of thumb is that if `sub` or `grep` don't have any expression in their `pattern` argument, rather a normal string, always use `fixed = TRUE` — David Arenburg, Dec 15 '16 at 11:15

score 1 · Answer 1 · answered Dec 15 '16 at 11:37

1

You can use grep to match 'a' and 'b', and replace as follows,

x[grep('a', x, fixed = TRUE)] <- 'A'
x[grep('b', x, fixed = TRUE)] <- 'B'

x
#[1] "A" "B" "A"

answered Dec 15 '16 at 11:37

Sotos

51,121
6
32
66

Regex Matching Negative values

1 Answers1