R: Does [:punct:] include +'s?

Question

Looking at R regex documentation, [:punct:] includes following characters -

Punctuation characters:

! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.

But when I try to use this in stringr::str_replace_all(), it doesn't seem to detect +s.

str_vec = c("c++", "c--", "c+_")
str_replace_all(str_vec, pattern = "[[:punct:]]", replacement = "_")
[1] "c++" "c__" "c+_"
str_replace_all(str_vec, pattern = "[[:punct:]]{2,}", replacement = "_")
[1] "c++" "c_"  "c+_"

Has it got to do with the locale settings?

Sys.getlocale()
[1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8"

or is it something else that I'm missing here?

This is not base Rs regex, but has something to do with the regex that `stringr ` uses: see `gsub("[[:punct:]]", "_", str_vec)`. — lmo, May 05 '16 at 15:01
Perhaps this regex pattern is not part of `stringr`'s vocabulary. See `help("stringi-search-regex")` for the list of patterns. — lmo, May 05 '16 at 15:08
Thanks @lmo. Looking at `help("stringi-search-charclass")`, I could see they are already warning about POSIX `[:punct:]` character class! `".. .. So a POSIX flavor of [:punct:] is more like [\p{P}\p{S}] in ICU. .. .. "` — steadyfish, May 05 '16 at 15:38

R: Does [:punct:] include +'s?

0 Answers0