7

I'm trying to remove all punctuation from a string except apostrophes. Here's my exastr2 <-

str2 <- "this doesn't not have an apostrophe,.!@#$%^&*()"
gsub("[[:punct:,^\\']]"," ", str2 )
# [1] "this doesn't not have an apostrophe,.!@#$%^&*()"

What am I doing wrong?

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
screechOwl
  • 27,310
  • 61
  • 158
  • 267

3 Answers3

17

A "negative lookahead assertion" can be used to remove from consideration any apostrophes, before they are even tested for being punctuation characters.

gsub("(?!')[[:punct:]]", "", str2, perl=TRUE)
# [1] "this doesn't not have an apostrophe"
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • This doesn't remove, of course, if the two apostrophes occur next to each other. But I guess you already knew that. – Arun Mar 06 '13 at 19:12
  • 2
    FWIW, [here's a link](http://stackoverflow.com/questions/406230/regular-expression-to-match-string-not-containing-a-word) to the best short explanation of negative lookaround assertions that I've ever seen. – Josh O'Brien Mar 06 '13 at 19:27
  • You can also exclude an apostrophe and i.e., a colon with "(?!')(?!:)[[:punct:]]" – Scott Kaiser Apr 12 '19 at 17:02
1

I am not sure if you can specify all punctuations except ' within a regexp the way you've done. I would check for alphanumerics + ' + space with negation:

gsub("[^'[:lower:] ]", "", str2) # per Joshua's comment
# [1] "this doesn't not have an apostrophe"
Arun
  • 116,683
  • 26
  • 284
  • 387
  • It depends what he wants. Does he want to ensure he only gets letters then what you wrote is most appropriate. If he really just wants to remove punctuation then it is safer to explicitly remove punctuation. – John Sobolewski Mar 06 '13 at 19:02
  • 5
    You should use `[[:lower:]]` instead of `[a-z]` as the latter is locale-specific. – Joshua Ulrich Mar 06 '13 at 19:08
1

You could use:

str2 <- "this doesn't not have an apostrophe,.!@#$%^&*()"

library(qdap)
strip(str2, apostrophe.remove = FALSE, lower.case = FALSE)
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • Tyler, openNLP a "depends" package doesn't install and exits with warning message "not available for R 2.15.2 and so couldn't install `qdap`. Any ideas? – Arun Mar 07 '13 at 08:02
  • 1
    Yes check this page out: http://trinker.github.com/qdap_install/installation. I assume you're using a mac. – Tyler Rinker Mar 07 '13 at 13:03
  • Can you let me know if iy works for you or improvements I could make to the install instructions. – Tyler Rinker Mar 07 '13 at 13:10
  • 1
    Tyler, from the link, it worked like a charm. No issues. – Arun Mar 07 '13 at 13:14