I want to remove all characters which doesn't match a string pattern using stringr
package. So far I've been able to remove those before the pattern using "\\w+(?= (grape|satsuma))"
as pattern but remove those after the pattern is still imposible.
> str_remove_all("apples grape banana melon olive persimon grape apples satsuma papaya",
+ "\\w+(?= (grape|satsuma))")
[1] " grape banana melon olive grape satsuma papaya"
The desired result is:
"grape grape satsuma"
(NOTE: I am aware the easiest approach in this case is to extract only "grape" and "satsuma" but for analysis purposes I prefer this way)
Edited providing the entire problem
The entire problem is as follow, given a d
data frame which contains a column with a string the function should return the same column only with matches:
> d
# A tibble: 2 x 2
string_column c2
<chr> <dbl>
1 apples grape banana satsuma 3
2 grape banana satsuma melon 4
Using the answer provided by @d.r works:
> d %>%
+ mutate_at(vars(string_column), ~ gsub("(grape|satsuma| )(*SKIP)(*FAIL)|.", "", ., perl = TRUE))
# A tibble: 2 x 2
string_column c2
<chr> <dbl>
1 " grape satsuma" 3
2 "grape satsuma " 4
All answers provided so far using stringr
package fail returning the string_column
This the dput
for d
:
d <- structure(list(string_column = c("apples grape banana satsuma",
"grape banana satsuma melon"), c2 = c(3, 4)), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"))