1

I have some data that is in almost-JSON format, but not quite. I'm trying to convert it from JSON using jsonlite in R. Here is a sample of data:

field
{'email': {'name': 'Bob Smith', 'address': 'bob_smith@blah.com'}}
{'email': {'name': "Sally O'Mally", 'address': 'sally_omally@blah.com'}}
{'email': {'name': 'Sam Daniels', 'address': '"some text"<sam_daniels@xyz.com>'}}
{'email': {'name': "Johnson', Alan", 'address': 'alan.johnson@abc.com'}}

What I want to do is strip out all of the quotation marks (both single and double) that are inside of the main quotations. The data would then look like this:

field
{'email': {'name': 'Bob Smith', 'address': 'bob_smith@blah.com'}}
{'email': {'name': "Sally OMally", 'address': 'sally_omally@blah.com'}}
{'email': {'name': 'Sam Daniels', 'address': 'some text<sam_daniels@xyz.com>'}}
{'email': {'name': "Johnson, Alan", 'address': 'alan.johnson@abc.com'}}

After that, I can handle converting the single quotes to double quotes using stringr and convert from JSON.

Any suggestions?

This is the error I currently get when trying to convert the original data from JSON:

> json_test2 <-
+   json_test %>%
+   dplyr::mutate(
+     field2 = map(field, ~ fromJSON(.) %>% as.data.frame())
+   )
Error: lexical error: invalid char in json text.
                                      {'email': {'name': 'Bob S
                     (right here) ------^
Heikki
  • 2,214
  • 19
  • 34
  • Why do you need/want to do this? I think most JSON parsers would be happy with either single or double quotes. – Tim Biegeleisen Apr 18 '19 at 04:23
  • I haven't been able to find a way to do that in R. I've edited my post to include the error I get when trying to just convert from JSON directly from the original data. – staplertape Apr 18 '19 at 04:34
  • @TimBiegeleisen, `jsonlite` fails with single-quotes, but `gsub("'",'"',...)` on this selection works without issue. See https://stackoverflow.com/a/14355724/3358272 – r2evans Apr 18 '19 at 04:35
  • 1
    If this R JSON library requires double quotes, then the best long term fix would be to go back to the source of the data. – Tim Biegeleisen Apr 18 '19 at 04:35
  • @r2evans Unfortunately it's not that simple, because there could be literal single quotes present as well. – Tim Biegeleisen Apr 18 '19 at 04:35
  • absolutely ... that's why I said *on this selection*. I was just about to write (to staplertape) that ... attempting a `gsub` regex replacement of single-quotes is so full of ... not-so-awesome possibilities. (https://xkcd.com/208/ and https://xkcd.com/1171/). But the point of my first comment is on your statement that most json parser would be happy with single/double. – r2evans Apr 18 '19 at 04:36
  • 1
    As a side-note, you don't have to loop over each `json` line manually using `dplyr/purrr` etc - just use `jsonlite::stream_in` directly - e.g.: https://stackoverflow.com/a/31575437/496803 – thelatemail Apr 18 '19 at 04:41

0 Answers0