1

I am struggling to parse a JSON in R which contains newlines both within character strings and between key/value pairs (and whole objects).

Here's the sort of format I mean:

{
    "id": 123456,
    "name": "Try to parse this",
    "description": "Thought reading a JSON was easy? \r\n Try parsing a newline within a string."
}
{
    "id": 987654,
    "name": "Have another go",
    "description": "Another two line description... \r\n With 2 lines."
}

Say that I have this JSON saved as example.json. I have tried various techniques to overcome parsing problems, suggested elsewhere on SO. None of the following works:

library(jsonlite)

foo <- readLines("example.json")
foo <- paste(readLines("example.json"), collapse = "")

bar <- fromJSON(foo)
bar <- jsonlite::stream_in(textConnection(foo))
bar <- purrr::map(foo, jsonlite::fromJSON)
bar <- ndjson::stream_in(textConnection(foo))
bar <- read_json(textConnection(foo), format = "jsonl")

I gather that this is really NDJSON format, but none of the specialised packages cope with it. Some suggest streaming in the data with either jsonlite or ndjson (or this one and this one). Others suggest mapping the function across lines (or similarly in base R).

Everything raises one of the following errors: Error: parse error: trailing garbage or Error: parse error: premature EOF or problems opening the text connection.

Does anyone have a solution?

Tom Wagstaff
  • 1,443
  • 2
  • 13
  • 15
  • 1
    That block (as a whole) isn't legitimate *json*, are those two dicts normally either (a) within a list, or (b) stored line-by-line, ndjson-style? – r2evans Jul 10 '19 at 17:22
  • 1
    Does:`jsonlite::prettify( )` work for you? – Dave2e Jul 10 '19 at 17:27
  • Hi @r2evans, no this is what I've got to work with - and I can't see a way to coerce it back into either of those formats... – Tom Wagstaff Jul 11 '19 at 09:43
  • Thanks @Dave2e - I've just tried `minify` as a first step but it fails in a similar manner to my other attempts to read it in... – Tom Wagstaff Jul 11 '19 at 09:44

1 Answers1

1

Edit

Knowing that the json is wrongly formatted, we lose some ndjson efficiency but I think we can fix it in real time, assuming that we clearly have a close-brace (}) followed by nothing or some whitespace (including newlines) followed by an open-brace ({)

fn <- "~/StackOverflow/TomWagstaff.json"
wrongjson <- paste(readLines(fn), collapse = "")
if (grepl("\\}\\s*\\{", wrongjson))
  wrongjson <- paste0("[", gsub("\\}\\s*\\{", "},{", wrongjson), "]")
json <- jsonlite::fromJSON(wrongjson, simplifyDataFrame = FALSE)
str(json)
# List of 2
#  $ :List of 3
#   ..$ id         : int 123456
#   ..$ name       : chr "Try to parse this"
#   ..$ description: chr "Thought reading a JSON was easy? \r\n Try parsing a newline within a string."
#  $ :List of 3
#   ..$ id         : int 987654
#   ..$ name       : chr "Have another go"
#   ..$ description: chr "Another two line description... \r\n With 2 lines."

From here, you can continue with

txtjson <- paste(sapply(json, jsonlite::toJSON, pretty = TRUE), collapse = "\n")

(Below is the original answer, hoping/assuming that the format was somehow legitimate.)


Assuming your data is actually like this:

{"id":123456,"name":"Try to parse this","description":"Thought reading a JSON was easy? \r\n Try parsing a newline within a string."}
{"id": 987654,"name":"Have another go","description":"Another two line description... \r\n With 2 lines."}

then it is as you suspect ndjson. From that you can do this:

fn <- "~/StackOverflow/TomWagstaff.json"
json <- jsonlite::stream_in(file(fn), simplifyDataFrame = FALSE)
# opening file input connection.
#  Imported 2 records. Simplifying...
# closing file input connection.
str(json)
# List of 2
#  $ :List of 3
#   ..$ id         : int 123456
#   ..$ name       : chr "Try to parse this"
#   ..$ description: chr "Thought reading a JSON was easy? \r\n Try parsing a newline within a string."
#  $ :List of 3
#   ..$ id         : int 987654
#   ..$ name       : chr "Have another go"
#   ..$ description: chr "Another two line description... \r\n With 2 lines."

Notice I've not simplified to a frame. To get your literal block on the console, do

cat(sapply(json, jsonlite::toJSON, pretty = TRUE), sep = "\n")
# {
#   "id": [123456],
#   "name": ["Try to parse this"],
#   "description": ["Thought reading a JSON was easy? \r\n Try parsing a newline within a string."]
# }
# {
#   "id": [987654],
#   "name": ["Have another go"],
#   "description": ["Another two line description... \r\n With 2 lines."]
# }

If you want to dump it to a file in that way (though nothing in jsonlite or similar will be able to read it, since it is no longer legal ndjson nor legal json as a whole file), then you can

txtjson <- paste(sapply(json, jsonlite::toJSON, pretty = TRUE), collapse = "\n")

and then save that with writeLines or similar.

r2evans
  • 141,215
  • 6
  • 77
  • 149