0

Does anyone know why this happens? I.e. why Unicode character is not displayed correctly within a data table row, but correctly when contained in a vector (data table column)?

>test.dt

>fuel    box         seller.name
>1: Gasoline Manual Michels S<U+00E0>rl

> test.dt[,seller.name]
>[1] "Michels Sàrl"
Iden
  • 21
  • 1

2 Answers2

1

First make sure your locale is set correctly. Try this:

library(data.table)
Sys.setlocale("LC_CTYPE", "") # set character type locale to native
df = data.table(id = 1, name = c("Michels Sàrl"),stringsAsFactors = F)

If that doesn't work, you may be running into a known bug in R on Windows; for another instance of this bug see https://stackoverflow.com/a/46720368/6233565

For a work-around, try this:

library(corpus)
print.corpus_frame(df)
Patrick Perry
  • 1,422
  • 8
  • 17
  • Thanks, Patrick, I can see characters now with corpus library. Though it does not solve my problem with characters when saving the data table either as .csv or .txt :( – Iden Oct 16 '17 at 16:30
  • Right, that's a related bug; currently it's not possible to write UTF-8 data on windows. see https://stackoverflow.com/a/46734577/6233565 – Patrick Perry Oct 16 '17 at 16:32
0

I tried the same example, it's showing normal. Please find below

library(data.table)
df = data.table(id = 1, name = c("Michels Sàrl"),stringsAsFactors = F)
>df
   id         name
1:  1 Michels Sàrl