The original Title for this Question was :
R Regex for word boundary excluding space.
It reflected the manner I was approaching the problem in. However, this is a better solution to my particular problem. It should work as long as a particular delimiter is used to separate items within a 'cell'
This must be very simple, but I've hit a brick wall on it. I have a dataframe column where each cell(row) is a comma separated list of items. I want to find the rows that have a specific item.
df<-data.frame( nms= c("XXXCAP,XXX CAPITAL LIMITED" , "XXX,XXX POLYMERS LIMITED, 3455" , "YYY,XXX REP LIMITED,999,XXX" ),
b = c('A', 'X', "T"))
nms b 1 XXXCAP,XXX CAPITAL LIMITED A 2 XXX,XXX POLYMERS LIMITED, 3455 X 3 YYY,XXX REP LIMITED,999,XXX T
I want to search for rows that have item XXX
. Rows 2 and 3 should match. Row 1 has the string XXX as part of a larger string and obviously should not match.
However, because XXX in row 1 is separated by spaces in each side, I am having trouble filtering it out with \\b
or [[:<:]]
grep("\\bXXX\\b",df$nms, value = F) #matches 1,2,3
The easiest way to do this of course is strsplit()
but I'd like to avoid it.Any suggestions on performance are welcome.