In the column text how it is possible to remove all punctuation remarks but keep only the ?
data.frame(id = c(1), text = c("keep<>-??it--!@#"))
expected output
data.frame(id = c(1), text = c("keep??it"))
A more general solution would be to used nested gsub
commands that converts ?
to a particular unusual string (like "foobar"), gets rid of all punctuation, then writes "foobar" back to ?
:
gsub("foobar", "?", gsub("[[:punct:]]", "", gsub("\\?", "foobar", df$text)))
#> [1] "keep??it"
Using gsub
you could do:
gsub("(\\?+)|[[:punct:]]","\\1",df$text)
[1] "keep??it"
gsub('[[:punct:] ]+',' ',data)
removes all punctuation which is not what you want.
But this is:
library(stringr)
sapply(df, function(x) str_replace_all(x, "<|>|-|!|@|#",""))
id text
[1,] "1" "a"
[2,] "2" "keep??it"
Better IMO than other answers because no need for nesting, and lets you define whichever characters to sub
.
Here's another solution using negative lookahead:
gsub("(?!\\?)[[:punct:]]", "", df$text, perl = T)
[1] "keep??it"
The negative lookahead asserts that the next character is not a ?
and then matches any punctuation.
Data:
df <- data.frame(id = c(1), text = c("keep<>-??it--!@#"))