1

I am trying to read in a JSON file (that I unfortunately cannot post here) into R using the fromJSON function from the jsonlite package.

Thereby, I always get a parsing error and I don't know why. The really strange thing is that I have two versions of the same file that appear absolutely the same to me in e.g. jsonformatter.org .

Does anyone know this error or how to track it? Is there another good way to read in JSON files into R?

Here is the error message:

Error in parse_con(txt, bigint_as_char) :
  lexical error: invalid char in json text.
                            ÿþ{                     (right here) ------^
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
K.O.T.
  • 111
  • 10
  • You may check [here](https://stackoverflow.com/questions/51247912/jsonlite-suddenly-retunring-error-failure-when-receiving-data-from-the-peer) – akrun Apr 17 '19 at 08:04
  • 2
    Check the file's encoding. `ÿþ` in your error message to me smells immediately like encoding problems – MichaelChirico Apr 17 '19 at 08:08
  • 1
    [`ÿþ` is the Byte Order Mark (BOM) for UTF-16-LE or UCS-2LE](https://stackoverflow.com/a/17291694/6530970); try reading the file in utf-16 encoding. – Maurits Evers Apr 17 '19 at 08:37
  • Thanks for the comments, the issue was actually the encoding that was Unicode. After changing it to ANSI it works. Thanks a lot! – K.O.T. Apr 17 '19 at 11:09
  • @K.O.T. Changing the file encoding to ANSI is a bad solution. Leave it as UTF-16 (better yet, use UTF-8 unless there’s a compelling reason not to), just remove the byte order mark (your editor should have an option to save it as UTF-* without BOM). – Konrad Rudolph May 16 '19 at 09:54
  • @KonradRudolph: Can you tell my why it is a bad idea? The thing is that I am creating an automated workflow which, as intermediate step, outputs a json which in turn is immediately taken as input. I can let the json be outputted as ANSI. – K.O.T. May 17 '19 at 13:12
  • @K.O.T. Except for very specific applications (= legacy support), it’s a good idea to consider all non-Unicode character encodings as obsolete. ANSI won’t be able to correctly encode everything in the source file. Furthermore, it’s simply unnecessary. The way I suggested works better and is exactly as simple as using an inferior encoding. – Konrad Rudolph May 17 '19 at 13:23

0 Answers0