0

I'm analyzing a multi-lingual text dataset with Chinese and Hebrew characters, and I notice that even though I correctly read in the text, the characters become strange codes like ""<U+05EA><U+05D5><U+05D1><U+05E0><U+05D4>" etc in the RStudio console output when I use tidyverse functions.

However, the View() function for tidyverse output will correctly display the characters in a separate window. Also, manipulating the data using base-R functions will yield correctly displayed results in the console.

I've had similar issue with another dataset where View() correctly shows Chinese characters but typing "tb" (suppose the tibble is called tb") gives me similarly wrong symbols. Also similarly, using base-R functions will give me correct display in the console.

Also, 2 notes. 1) I cannot use Sys.setlocale to solve the issue because my dataset is multilingual and I cannot commit to one foreign language code.

  1. The images below shows my issue. base-R data.frame related operations work, but tidyverse subsetting doesn't. The two images refer to exactly the same data.

I wonder if this is intended behavior of tidyverse or is it a bug that I can fix in some way? It would be more convenient if the display is correct for quick checks in console outputs.

Thank you!

enter image description here enter image description here

Xuewen
  • 1
  • 1
  • Hello, I've read the other post but that doesn't solve my issue. if I use the data.frame function, (as I said with base-R functions), the results are correct in the console. But if I use "tibble" or anything in the tidyverse family, they don't display correctly. Also, I cannot use Sys.setlocale to solve the issue because my dataset includes multiple foreign languages. – Xuewen Aug 28 '21 at 23:25
  • 1
    It's not to do with tidyverse, it's to do with using functions that call `format()` under the hood - you'll notice that base data frames won't print correctly either. Try running `Sys.setlocale("LC_CTYPE", locale="Hebrew")` and printing the data in your example again. – Ritchie Sacramento Aug 28 '21 at 23:27
  • Hello, I've added a picture showing that the baseR df functions work.... – Xuewen Aug 28 '21 at 23:33
  • If you edit your question to ask something along the lines of "How do I get data frames that contain multilingual strings to print correctly?" it will be re-opened. – Ritchie Sacramento Aug 28 '21 at 23:48

0 Answers0