0

I am trying to use the code shown below to extract data from a json file. However, the following error is returned:

Error: lexical error: invalid bytes in UTF8 string.
          fr":"Ces données sont publiées avec un délai de cinq jours
                     (right here) ------^

Inspecting the json file in my browser shows that the data appears as such:

"fr":"Ces donn\u00e9es sont publi�es avec un d\u00e9lai de cinq jours."

Is there a way to write the data while ignoring any UTF8 strings that cause an error?

library(jsonlite)

URL <- paste0("https://www.energy-charts.de/power_unit/month_lignite_unit_2017_12.json")

data <- fromJSON(getURL(URL))
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
mouphasa
  • 1
  • 1
  • 1
  • The errors you are seeing are caused by having non-UTF-8 strings being declared to be UTF-8. The solution is to declare them properly from the beginning; then the errors will go away. – user2554330 Feb 11 '19 at 11:22

1 Answers1

1

The problem is that the URL returns data in a latin1 encoding, and your system is defaulting to reading it as UTF-8. You can get it correctly using

library(jsonlite)
library(RCurl)  

URL <- "https://www.energy-charts.de/power_unit/month_lignite_unit_2017_12.json"

data <- fromJSON(getURL(URL, encoding = "latin1"))

I've also corrected some minor errors in your code: you forgot to request RCurl, and paste0 was not needed.

user2554330
  • 37,248
  • 4
  • 43
  • 90
  • jsonlite can extract data from a URL automatically, using the curl package (which is by the same author) – Hong Ooi Feb 11 '19 at 11:58
  • Sure, but it doesn't work in this case, because the server isn't declaring the encoding of the file. If you're on a system which defaults to UTF-8, you'll get the error the OP got. – user2554330 Feb 11 '19 at 13:20