0

I'm trying to create some simple and easy to write content-clusters with multiple regexes.

Imagine a list of strings: c("a","b","ac") The groups I need to define are "All: a's" and "All: b's". So the values "a" and "ac" are "A" and "b" is "B".

myDF$contentGroup <- sub(".*a.*", "A", myDF$stringList)

However this will result in a column within my dataframe "contentGroup" which contains the value of "stringList" if no match occured. So if I do the same line of code with "B" it will overwrite the "A"s.

myDF$contentGroup <- sub(".*b.*", "B", myDF$stringList)

I just cant figure out how to do simple clustering in a single line of code. Making it as simple as possible.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
michaelsinner
  • 376
  • 1
  • 2
  • 10

1 Answers1

1

You can use grep to match 'a' and 'b', and replace as follows,

x[grep('a', x, fixed = TRUE)] <- 'A'
x[grep('b', x, fixed = TRUE)] <- 'B'

x
#[1] "A" "B" "A"
Sotos
  • 51,121
  • 6
  • 32
  • 66