14

I am trying to read in data from a csv file and specify the encoding of the characters to be UTF-8. From reading through the ?read.csv() instructions, it seems that fileEncoding set equal to UTF-8 should accomplish this, however, I am not seeing that when checking. Is there a better way to specify the encoding of character strings to be UTF-8 when importing the data?

Sample Data:

Download Sample Data here

fruit<- read.csv("fruit.csv", header = TRUE, fileEncoding = "UTF-8")
fruit[] <- lapply(fruit, as.character)
Encoding(fruit$Fruit)

The output is "uknown" but I would expect this to be "UTF-8". What is the best way to ensure all imported characters are UTF-8? Thank you.

Rob.C
  • 209
  • 1
  • 2
  • 8
  • 3
    Tried using the `encoding` argument instead? – Abdou Aug 17 '16 at 01:58
  • 1
    What was the class before you forced conversion to `as.character`? Maybe add `stringsAsFactors=FALSE` to `read.csv`. Also, if non of your characters are outside the ASCII range, it won't bother with the encoding: `x<-"Hello"; Encoding(x)<-"UTF-8"; Encoding(x)` vs `x<-"Həllö"; Encoding(x)` – MrFlick Aug 17 '16 at 02:07

1 Answers1

8
fruit       <- read.csv("fruit.csv", header = TRUE)
fruit[]     <- lapply(fruit, as.character)
fruit$Fruit <- paste0(fruit$Fruit, "\xfcmlaut") # Get non-ASCII char and jam it in!
Encoding(fruit$Fruit)

[1] "latin1" "latin1" "latin1"

fruit$Fruit <- enc2utf8(fruit$Fruit)
Encoding(fruit$Fruit)

[1] "UTF-8" "UTF-8" "UTF-8"

Hack-R
  • 22,422
  • 14
  • 75
  • 131