4

When there are UTF-8 characters in the data frame, it won't be displayed properly.

For example, the following is correct:

> "\U6731"
[1] "朱"

But when I put that in a data frame and have it printed, here it is:

> data.frame(x="\U6731")
         x
1 <U+6731>

Hence I believe this has nothing to do with encoding issues.

Is there any direct way to print instead of <U+6731>.

I have to use Windows in company so using Linux might not be feasible for me.

John
  • 1,779
  • 3
  • 25
  • 53

2 Answers2

3

The corpus library has a work-around for this bug. Either do this:

library(corpus)
df <- data.frame(x = "\U6731")
print.corpus_frame(df)

Or else do this:

class(df) <- c("corpus_frame", "data.frame")
df
Patrick Perry
  • 1,422
  • 8
  • 17
  • Thanks. This really helps. BTW is there any function in `corpus` that can save the corpus frame to csv with correct encoding? – John Oct 13 '17 at 01:24
  • Sorry, no. That might not be possible on Windows. Try newline-delimited JSON, using `jsonlite::stream_out` to write and either `jsonlite::stream_in` or `corpus::read_ndjson` to read – Patrick Perry Oct 13 '17 at 11:05
1

You are right, while calling the whole dataframe it will give codes for UTF-8 characters:

> data.frame(x="\U6731")
         x
1 <U+6731>

But if you call for columns or rows, it would print nicely:

# through the column name
> data.frame(x="\U6731")$x
[1] 朱
Levels: 朱

# through the column index
> data.frame(x="\U6731")[,1]
[1] 朱
Levels: 朱

# through the row index
> data.frame(x="\U6731")[1,]
[1] 朱
Levels: 朱

Not sure if this helps. Could you be more specific why and how exactly you need to output these characters?

Alex Knorre
  • 620
  • 4
  • 15
  • I need a workaround to get a printed data frame which does show UTF-8 characters. Will saving the data frame as a csv file help? – John Jun 14 '17 at 09:10
  • It is a well-known problem in Windows, see: http://people.fas.harvard.edu/~izahn/posts/reading-data-with-non-native-encoding-in-r/ But what is your problem? If you just want to view the dataframe, you can either use `print.listof()` or save the dataframe as CSV and view it in Excel or other table processor. – Alex Knorre Jun 14 '17 at 16:13
  • How could I save the data frame containing UTF-8 characters as a csv file so I can just open it and see the correct characters. Thanks. – John Jun 15 '17 at 03:08
  • Just use `write.csv2(your_data, "df.csv", row.names = F)` and then open the file va Excel or something like this. – Alex Knorre Jun 15 '17 at 14:33