5

I am working with R Studio and need to import a csv File for Text mining. The File is windows-1252 encoded and contains German Umlauts.

However I cannot get R to import these Umlauts correctly. using read.table(X,fileEncoding="UTF-8") results in an error.

What am I missing?

----UPDATE----

The File I am trying to read is: https://drive.google.com/file/d/0B4kGh2YwTmb9U3hkei1TTHlUME0/edit?usp=sharing

Using this R Code:

Sys.setlocale("LC_CTYPE", "german") dataset <- read.table("../processed/DE_all_CDM_201405050001_DE_all_CDM2014-05-05_rcout.csv", encoding="UTF-8", header=TRUE, sep=";", stringsAsFactors=F, as.is=T) dataset <- dataset[,c(1,11,30)] Encoding(dataset[,2]) <- "UTF-8"

hag o hi
  • 117
  • 1
  • 1
  • 9
  • Have you read the "Note" section in `read.table`? – Roland Jun 16 '14 at 13:43
  • Yes, but Im not sure if I understand correctly. My Locale is : "LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252" And Im running windows... – hag o hi Jun 16 '14 at 13:47
  • Have you set the Sys.Locale? – Rachel Gallen Jun 16 '14 at 13:48
  • 1
    You may want to read this. it's for russian but the same applies im sure for any foreign language http://quantifyingmemory.blogspot.ie/2013/01/r-and-foreign-characters.html – Rachel Gallen Jun 16 '14 at 13:48
  • You may find [this answer](http://stackoverflow.com/questions/11069908/r-extracting-clean-utf-8-text-from-a-web-page-scraped-with-rcurl) useful - shows how to use the locale to import Japanese characters without them turning into line noise. Umlauts may also benefit! – SlowLearner Jun 16 '14 at 13:52
  • Sorry, none of the obove works for me. I tried to set the locale: Sys.setlocale("LC_CTYPE", "german") which leads to i.e. "Gelndewagen" instead of "Geländewagen". Also setting the encoding in read.table with fileEncoding="UTF-8" nor encoding="UTF-8" dont work. Encoding(df) <- "UTF-8" also has no result. – hag o hi Jun 16 '14 at 14:07
  • My locale is identical to what you show above (it's the default on German windows systems). In general I have no problem importing umlauts. There is something you are not showing. A reproducible example could be useful. – Roland Jun 16 '14 at 14:13
  • Ok, thanks. I Updated the original Post with a (hopefully) reproductible example – hag o hi Jun 16 '14 at 14:28
  • 3
    I won't register with Google just to download your file ... – Roland Jun 16 '14 at 14:39
  • Ok, I can see that. Do you have an alternative to upload files? – hag o hi Jun 18 '14 at 07:18

1 Answers1

3

Ok, I just found out that this is a R Studio GUI issue. If I run my code in the R console it würls fine.

hag o hi
  • 117
  • 1
  • 1
  • 9