I am preparing a dataset that contains CJK
characters with R and mostly through Tidyverse. During the process, I found that some character elements has \037
at the very end.
# A tibble: 99 × 2
Prefecture n
<chr> <int>
1 \037 1
2 北海道\037 1
3 北海道 13
4 北海道 4
... ... ...
I have tried to remove them with the line below:
library(stringr)
out.file %>% mutate(
Prefecture = str_replace_all(out.file$Prefecture, "\\\\037", "")
)
The str_replace_all
does remove all the \037
s when being tested on a string. When applying mutate
on an entire column, however, the lines above still gives the same results in the first code chunk in this post.
What would be the most efficient way to remove them from strings?
Update with solution
require(stringi)
out.file %>%
mutate(Prefecture = stri_escape_unicode(Prefecture),
Prefecture = str_replace_all(Prefecture, "\037", ""),
Prefecture = stri_unescape_unicode(Prefecture))
This way I am able to resolve the issue successfully.