Looking at R regex documentation, [:punct:]
includes following characters -
Punctuation characters:
! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.
But when I try to use this in stringr::str_replace_all()
, it doesn't seem to detect +
s.
str_vec = c("c++", "c--", "c+_")
str_replace_all(str_vec, pattern = "[[:punct:]]", replacement = "_")
[1] "c++" "c__" "c+_"
str_replace_all(str_vec, pattern = "[[:punct:]]{2,}", replacement = "_")
[1] "c++" "c_" "c+_"
Has it got to do with the locale settings?
Sys.getlocale()
[1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8"
or is it something else that I'm missing here?