0

In the column text how it is possible to remove all punctuation remarks but keep only the ?

data.frame(id = c(1), text = c("keep<>-??it--!@#"))

expected output

data.frame(id = c(1), text = c("keep??it"))
markus
  • 25,843
  • 5
  • 39
  • 58
rek
  • 177
  • 7

4 Answers4

0

A more general solution would be to used nested gsub commands that converts ? to a particular unusual string (like "foobar"), gets rid of all punctuation, then writes "foobar" back to ?:

gsub("foobar", "?", gsub("[[:punct:]]", "", gsub("\\?", "foobar", df$text)))
#> [1] "keep??it"
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
0

Using gsub you could do:

gsub("(\\?+)|[[:punct:]]","\\1",df$text)

[1] "keep??it"
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

gsub('[[:punct:] ]+',' ',data) removes all punctuation which is not what you want.

But this is:

library(stringr)
sapply(df, function(x) str_replace_all(x, "<|>|-|!|@|#",""))
     id  text      
[1,] "1" "a"       
[2,] "2" "keep??it"

Better IMO than other answers because no need for nesting, and lets you define whichever characters to sub.

gaut
  • 5,771
  • 1
  • 14
  • 45
0

Here's another solution using negative lookahead:

gsub("(?!\\?)[[:punct:]]", "", df$text, perl = T)
[1] "keep??it"

The negative lookahead asserts that the next character is not a ? and then matches any punctuation.

Data:

df <- data.frame(id = c(1), text = c("keep<>-??it--!@#"))
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34