R doesn't accept certain Serbian characters with diacritics (č, ć)

Question

Serbian alphabet has 5 additional letters (š, đ, ž, č, ć) on top of English alphabet. The problem is R won’t recognize č and ć. Characters š, đ, and ž work fine, but whenever I try to use č and ć, R interprets them as c.

>š
Error: object 'š' not found
>ž
Error: object 'ž' not found
>đ
Error: object 'd' not found
>č
function (..., recursive = FALSE)  .Primitive("c")
>ć
function (..., recursive = FALSE)  .Primitive("c")

When I read in files into R, it always substitutes č and ć with c.

Is there any way around this?

>Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

user5249203 · Accepted Answer · 2016-06-17T20:16:00.310

1

changing system locale to the specific language probably helps. Using "UTF-8" format should preserve the special characters When you read

  read.table("file.txt",encoding="UTF-8")

If you are writing a file, you can do something like this

  con <- file("path/filename.txt", encoding = "UTF-8")
  write(x, file = con)

edited Jun 17 '16 at 20:16

answered Jun 17 '16 at 16:53

user5249203

4,436
1
19
45

1

It does presereve everything but č and ć, it converts them to c. – magasr Jun 17 '16 at 16:55
Good observation. May be the below workaround solution works for you http://stackoverflow.com/questions/29957678/utf-8-characters-get-lost-when-converting-from-list-to-data-frame-in-r – user5249203 Jun 17 '16 at 17:09
Thx for your input. I did some more testing. When I read a file using encoding = "UTF-8", it reads č and ć correctly, but when I do it with encoding = "utf-8" it doesn't. Why is that? The question still remains is there a way to use č and ć in r console so I'll leave it open. – magasr Jun 17 '16 at 18:00
1

Did you try setting your system locale environment to the language the characters you read in ? – user5249203 Jun 17 '16 at 20:17

R doesn't accept certain Serbian characters with diacritics (č, ć)

1 Answers1