I'm analyzing a multi-lingual text dataset with Chinese and Hebrew characters, and I notice that even though I correctly read in the text, the characters become strange codes like ""<U+05EA><U+05D5><U+05D1><U+05E0><U+05D4>" etc in the RStudio console output when I use tidyverse functions.
However, the View() function for tidyverse output will correctly display the characters in a separate window. Also, manipulating the data using base-R functions will yield correctly displayed results in the console.
I've had similar issue with another dataset where View() correctly shows Chinese characters but typing "tb" (suppose the tibble is called tb") gives me similarly wrong symbols. Also similarly, using base-R functions will give me correct display in the console.
Also, 2 notes. 1) I cannot use Sys.setlocale
to solve the issue because my dataset is multilingual and I cannot commit to one foreign language code.
- The images below shows my issue. base-R data.frame related operations work, but tidyverse subsetting doesn't. The two images refer to exactly the same data.
I wonder if this is intended behavior of tidyverse or is it a bug that I can fix in some way? It would be more convenient if the display is correct for quick checks in console outputs.
Thank you!