0

I want to perform a simple task: I have 60,000 xml files which I want to change their encoding to utf-8. All I want to do is to write kind of a loop that reads the xml file and then saves it immediately with the right encoding. That's it. How can I do that in R?

Corel
  • 581
  • 3
  • 21
  • Do they all have the same current encoding, or do you need to read the XML declaration to determine the encoding? – Michael Kay Aug 24 '17 at 13:59

1 Answers1

0

As suggested in this post, use iconv.

In general:

writeLines(iconv(readLines("tmp.html"), from = "ANSI_X3.4-1986", to = "UTF8"), "tmp2.html")

On Windows use:

writeLines(iconv(readLines("tmp.html"), from = "ANSI_X3.4-1986", to = "UTF8"), 
           file("tmp2.html", encoding="UTF-8"))
Erwin Bolwidt
  • 30,799
  • 15
  • 56
  • 79
gmatharu
  • 83
  • 1
  • 8
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/low-quality-posts/17133775) – Yoh Deadfall Aug 24 '17 at 13:20
  • Thanks for clarifying [yoh-deadfall](https://stackoverflow.com/users/4593390/yoh-deadfall) Editing my reply now! – gmatharu Aug 24 '17 at 13:30
  • The problem with this approach is that it leaves the XML declaration unchanged, which means that a subsequent attempt to parse the file may attempt to decode it incorrectly. – Michael Kay Aug 24 '17 at 14:00
  • [Michael Kay](https://stackoverflow.com/users/415448/michael-kay) , the above code creates another file(tmp2.html in this example), so there are 2 options: Either rename the file after this code runs or use the new file. – gmatharu Aug 24 '17 at 20:27