4

Serbian alphabet has 5 additional letters (š, đ, ž, č, ć) on top of English alphabet. The problem is R won’t recognize č and ć. Characters š, đ, and ž work fine, but whenever I try to use č and ć, R interprets them as c.

>š
Error: object 'š' not found
>ž
Error: object 'ž' not found
>đ
Error: object 'd' not found
>č
function (..., recursive = FALSE)  .Primitive("c")
>ć
function (..., recursive = FALSE)  .Primitive("c")

When I read in files into R, it always substitutes č and ć with c.

Is there any way around this?

>Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
magasr
  • 493
  • 5
  • 21

1 Answers1

1

changing system locale to the specific language probably helps. Using "UTF-8" format should preserve the special characters When you read

  read.table("file.txt",encoding="UTF-8")

If you are writing a file, you can do something like this

  con <- file("path/filename.txt", encoding = "UTF-8")
  write(x, file = con)
user5249203
  • 4,436
  • 1
  • 19
  • 45
  • 1
    It does presereve everything but č and ć, it converts them to c. – magasr Jun 17 '16 at 16:55
  • Good observation. May be the below workaround solution works for you http://stackoverflow.com/questions/29957678/utf-8-characters-get-lost-when-converting-from-list-to-data-frame-in-r – user5249203 Jun 17 '16 at 17:09
  • Thx for your input. I did some more testing. When I read a file using encoding = "UTF-8", it reads č and ć correctly, but when I do it with encoding = "utf-8" it doesn't. Why is that? The question still remains is there a way to use č and ć in r console so I'll leave it open. – magasr Jun 17 '16 at 18:00
  • 1
    Did you try setting your system locale environment to the language the characters you read in ? – user5249203 Jun 17 '16 at 20:17