I was using the [:punct:]
regular expression character class, and it seems to me that the stringr
package does not define [:punct:]
the same way that the base grep
does.
> grepl('[[:punct:]]', '^HELLO')
[1] TRUE
> str_detect('^HELLO', '[[:punct:]]')
[1] FALSE
stringr
and grep
generally agree on some of the basic punctuations (including ,
and .
):
> grepl('[[:punct:]]', '?HELLO')
[1] TRUE
> str_detect('?HELLO', '[[:punct:]]')
[1] TRUE
But not on others such as `
, ~
and |
and possibly others. Here is a fuller test of [:punct:]
below, though I also have not tested other character classes. Unsure whether this is limited to just [:punct:]
.
library(stringr)
punct <- c(
".", ",", ":", ";", "?", "!", "\\", "|", "/", "`", "=","*", "+", "-", "^",
"_", "~", "\"", "'", "[", "]", "{", "}", "(", ")", "<", ">", "@", "#", "$"
)
grepl("[[:punct:]]", punct)
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [15] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [29] TRUE TRUE
str_detect(punct, "[:punct:]")
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE
#> [12] TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
#> [23] TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE
punct[which(!str_detect(punct, "[:punct:]"))]
#> [1] "|" "`" "=" "+" "^" "~" "<" ">" "$"
Created on 2018-05-03 by the reprex package (v0.2.0).