-1

I have following html document:

library(rvest)

sess <- html_session("http://www.sudacka-mreza.hr/sudska-praksa.aspx", encoding = "UTF-8")
form <- html_form(sess)[[1]]
fill_form <- set_values(form, 'uc_login1$LoginUserName' = 'mislav.sagovac@contentio.biz',
                        'uc_login1$LoginPassword' = 'theanswer')
sess_submit <- submit_form(sess, fill_form, submit = "uc_login1$LoginSubmitButton", encoding = "UTF-8")
praxis <- sess_submit %>%
  jump_to( "odluke.aspx?Search=&Search2=&Court=112&Type=---&Type1=---&Type1a=---&Type2=---&Type2a=---&Type3=&Type4=&O1=&O2=&O3=&O4=&P1=&P2=&ShowID=21216"
           , encoding = "UTF-8")

decision <- read_html(praxis, encoding = "UTF-8") %>%
  html_nodes(xpath = "//*[@id='mainContent']")

I want to save decision as html. I tried several solutions (using write_html, read.table) but some of UTF-8 characters are not displayed right in html file.

Tried solutions:

# first tried solutions
decision <- paste(as.character(decision), collapse = "\n")
write.table(decision, 
            file=paste0("some_path.html"), 
            quote = FALSE,
            col.names = FALSE,
            row.names = FALSE
            # fileEncoding = "UTF-8"
)

# second tried solutions
writeLines(iconv(decision,
                 from = "CP1252", to = "UTF8"), 
           file(paste0("some_path.html"),
                encoding="UTF-8"))
MrFlick
  • 195,160
  • 17
  • 277
  • 295
Mislav
  • 1,533
  • 16
  • 37
  • Are you getting an error? How exactly are you "checking" the file to see if the characters are correct. Are you on a windows machine? What does `Encoding(decision)` return after the `paste()`? – MrFlick Aug 24 '18 at 21:40
  • I'm guessing you want `writeLines(decision, "some_path.html", useBytes = TRUE)` as described [here](https://stackoverflow.com/questions/10675360/utf-8-file-output-in-r). – MrFlick Aug 24 '18 at 21:44
  • I opened the file after I had save it locally. I have consulted above links but non of the solution worked. I am not getting an error, just some characters are not UTF-8. I am on windows machine. Encoding function returns "UTF-8" – Mislav Aug 25 '18 at 09:06
  • line from second comment doesn't work. – Mislav Aug 25 '18 at 09:08
  • In duplicate answer they save the file as tyt, not html – Mislav Aug 25 '18 at 09:27
  • An html file is a text file; that shouldn’t matter. When you say you “opened” the file, what program did you open it with? When you say something “doesn’t work,” it’s helpful to describe exactly what is happening. Generally speaking, windows much prefers files to be encoded in latin1 rather than utf8. – MrFlick Aug 25 '18 at 13:15
  • I want to save it as html file on windows, and than when I open it by click on it, I want ot see all Croatian characters showed in right way. Two types of characters are not shown in right way. I t is important for me that file i a html file ssnce it looks nicer. There would be lot's of html files that people will search by text... – Mislav Aug 25 '18 at 15:20

1 Answers1

0

Well, I made an error. I took only part of the html document and forgot add again the metadata part from the head tag. Now it works with above methods.

Mislav
  • 1,533
  • 16
  • 37