0

I'm trying to read "arxiv-metadata-oai-snapshot.json" in R from this website: https://www.kaggle.com/Cornell-University/arxiv?select=arxiv-metadata-oai-snapshot.json I have already downloaded the data and renamed the file "arxiv.json".

With the help of the "jsonlite" package in R, I'm trying to read "arxiv.json" but it is just not possible:

library(jsonlite)
arxiv <- fromJSON(file = "C:/Users/caproki/Downloads/arXiv Dataset/arxiv.json")

I get the following error:

Error in parse_con(txt, bigint_as_char) : parse error: trailing garbage
      ",""],["Yuan","C. -P.",""]]} {"id":"0704.0002","submitter":"
                 (right here) ------^

What could I do? Thanks in advance!

caproki
  • 348
  • 2
  • 18
  • Does this answer your question? [Parse Error: "Trailing Garbage" while trying to parse JSON column in data frame](https://stackoverflow.com/questions/38858345/parse-error-trailing-garbage-while-trying-to-parse-json-column-in-data-frame) – Ryan Morton Sep 18 '20 at 19:47
  • @RyanMorton well the thing is that I'm not trying to read a CSV and then convert it to a JSON. I'm trying to directly read a JSON file – caproki Sep 18 '20 at 20:09
  • 1
    the error relates to trailing garbage on the JSON. Use the answer for that portion. It really doesn't matter where the JSON originated. The issue is it can't parse the trailing garbage. So, either remove the garbage from the JSON file or try one of the linked answers. – Ryan Morton Sep 18 '20 at 20:14
  • To me it doesn't look like trailing garbage in a json file, it looks like ndjson. One of the suggestions in that answer is to use `jsonlite::stream_in`, which I think is what you need here. Not that there is actual garbage in the data file, but that it technically is not a json file. – r2evans Sep 18 '20 at 21:14

0 Answers0