Removing "\037" from strings in R

Asked Apr 07 '17 at 17:27

Active Apr 07 '17 at 18:42

Viewed 182 times

I am preparing a dataset that contains CJK characters with R and mostly through Tidyverse. During the process, I found that some character elements has \037 at the very end.

# A tibble: 99 × 2
     Prefecture     n
            <chr> <int>
1            \037     1
2      北海道\037     1
3          北海道    13
4          北海道     4
...          ...     ...

I have tried to remove them with the line below:

library(stringr)
out.file %>% mutate(
    Prefecture = str_replace_all(out.file$Prefecture, "\\\\037", "")
)

The str_replace_all does remove all the \037s when being tested on a string. When applying mutate on an entire column, however, the lines above still gives the same results in the first code chunk in this post.

What would be the most efficient way to remove them from strings?

Update with solution

require(stringi)
out.file %>% 
mutate(Prefecture = stri_escape_unicode(Prefecture), 
       Prefecture = str_replace_all(Prefecture, "\037", ""),
       Prefecture = stri_unescape_unicode(Prefecture))

This way I am able to resolve the issue successfully.

edited Apr 07 '17 at 18:42

asked Apr 07 '17 at 17:27

Carl H

1,036
2
15
27

1

This may help: http://stackoverflow.com/a/25466734/1000343 – Tyler Rinker Apr 07 '17 at 17:49
1

Thanks! @TylerRinker. That was a helpful post, I was able to escape from `CJK`, replace the unwanted characters, and unescape them all. This solves my issue. – Carl H Apr 07 '17 at 18:39

Removing "\037" from strings in R

0 Answers0