Still working on this, but ?"stringi-search-charclass"
says:
Beware of using POSIX character classes, e.g. ‘[:punct:]’. ICU
User Guide (see below) states that in general they are not
well-defined, so may end up with something different than you
expect.
In particular, in POSIX-like regex engines, ‘[:punct:]’ stands for
the character class corresponding to the ‘ispunct()’
classification function (check out ‘man 3 ispunct’ on UNIX-like
systems). According to ISO/IEC 9899:1990 (ISO C90), the
‘ispunct()’ function tests for any printing character except for
space or a character for which ‘isalnum()’ is true. However, in a
POSIX setting, the details of what characters belong into which
class depend on the current locale. So the ‘[:punct:]’ class does
not lead to portable code (again, in POSIX-like regex engines).
So a POSIX flavor of ‘[:punct:]’ is more like ‘[\p{P}\p{S}]’ in
‘ICU’. You have been warned.
Copying from the issue posted above,
string <- "a . , > 1 b"
mypunct <- "[[\\p{P}][\\p{S}]]"
stringr::str_remove_all(string, mypunct)
I can appreciate stuff being locale-specific, but it still surprises me that [:punct:]
doesn't even work in a C locale ...