0

I'm reading html nodes from this site to make a tex file out of it. But because there are many names and places from different countries I have a problem with encoding. I've tried using UTF-8 encoding but it does not seem to support every language. Maybe there's a function in R which can translate "ż" to "z" and every other character. I don't really need them to be the way they are, but because of them I cannot compile my tex file. For example from "Eustachy Karol Żyliński" I received "Eustachy Karol Ĺ»yliĹ„ski". Also if the solution will be an different encoding could you also tell me which packages I should implement in the TeX file.

To read the html nodes I'm using

library(rvest) matematyk=LinkWlasciwy[j] %>% read_html() %>% html_nodes(selektor1) %>% html_text()

And to create output file I'm using:

write(sprintf("%s|%s|%s|%s\n",paste0(matematyk[1]),paste0(matematyk[2]),paste0(matematyk[3]),paste0(LinkWlasciwy[j])),file=nazwapliku1,append = TRUE)

It's all in a loop that saves every row of information from nodes h1 and h3 to file.

G5W
  • 36,531
  • 10
  • 47
  • 80
  • Hi, welcome to SO. Please consider reading up on [ask] and how to produce a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It makes it easier for others to help you. – Heroka Feb 28 '16 at 12:35

1 Answers1

0
library(stringi)
?stri_read_lines
?stri_write_lines

stringi is a library to help with encoding issues in R. In your problem especially stri_write_lines should help with argument encoding (default is UTF-8)

bartoszukm
  • 693
  • 3
  • 10
  • Ok, since i still need wait about 15mins to compile my script i want to ask after i use encoding = "auto". Will i'll be able to compile tex file afterwards ? Or i would still need to implement some kind of encodinc package (currently cp1250 for Polish) – Karol Kreczman Feb 28 '16 at 13:11
  • You did not provide any reproducible example, so it is hard to say anything for sure. But "auto" makes that you read file in any encoding (Polish works also) and you obtain a vector in UTF-8 encoding in R. – bartoszukm Feb 28 '16 at 13:14
  • Ok since auto provides me with UTF-8 that's solves everything. Thank you for your time. – Karol Kreczman Feb 28 '16 at 13:17