4

What is the regex to match any Chinese character in R?

[\\p{Han}] doesn't seem to work as expected.

v=c("a","b","c","中","e","文")
grep("[\\p{Han}]",v, value = TRUE)

[1] "a"
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Sati
  • 716
  • 6
  • 27

1 Answers1

3

According to regular-expressions.info, "The JGsoft engine, Perl, PCRE, PHP, Ruby 1.9, Delphi, and XRegExp can match Unicode scripts". So setting perl = T should produce the correct results. The R default is a modified version of Ville Laurikari's TRE engine (source):

grep("[\\p{Han}]", v, value = T, perl = T)

#### OUTPUT ####

[1] "中" "文"