Match numbers before particular Chinese words

Question

How can I use stringr to match number before particular Chinese words? For example 2020年1月4日 (4 Jan 2020)? I want to get something like this:

[1] 2020 1 4

Does this work for you? `gsub("[\U4E00-\U9FFF\U3000-\U303F]", " ", x)` where `x` is the string from https://stackoverflow.com/questions/47068770/how-do-i-remove-all-the-chinese-characters-from-a-string — Ronak Shah, Apr 09 '20 at 05:10

JBGruber · Answer 1 · 2020-04-13T09:24:24.760

It's not terribly clear what you want, unfortunately.

Do you want to use str_match? Then this is the correct regex_

string <- "2020年1月4日"
library(stringr)
str_match(string = string,
          pattern = "\\d+年\\d+月\\d+日")
#>      [,1]
#> [1,] "2020年1月4日"

If you want to extract a pattern which matches the regex:

str_extract(string = string,
            pattern = "\\d+年\\d+月\\d+日")
#> [1] "2020年1月4日"

Or if you just want to know if the pattern is present in your string:

str_detect(string = string,
           pattern = "\\d+年\\d+月\\d+日")
#> [1] TRUE

Did you use these commands and they are not working as expected? Then you might want to look into encoding of your string.

Does this answer your question?

1 Answers1