reading a json file with nested arrays at every line in R

Question

I have json file which contains 585 nested arrays. Each of the 585 arrays has the following structure:

[["12345678", [["12345678912345678", "dummy tweet #hashtag", "2015-05-20 15:33:11", "en"], ["12345678123456781", "dummy tweet again", "2015-05-18 22:08:30", "en"]]]

each array has a user id, and a corresponding array of arrays of that has user's tweets, in which each array has, tweet id, tweet, date and language.

I tried to stream the file with jsonlite stream in function as followstest_json = stream_in(file("test.json")) but i get the following error:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 3189, 607, 219, 3120, 3091
closing file input connection.

I then read 10 lines test10_json =readLines("test.json", n =10) which worked for me.

Now I wanted to read this json and tried fromJSON(test10_json) but I get this error:

Error: parse error: trailing garbage
          015-05-06 19:18:35", "en"]]] ["123456789", [["5101600420255
                     (right here) ------^

which means that when it tries to read the next array/record (the second row of the file, 2nd user's record) it gives an error. However when I read each user record separately as follows test10_row1 = fromJSON(test10_json[1]) it reads them individually. and creates a list of List of 2

 $ : chr "19211550"
 $ : chr [1:3189, 1:4]

where each of 1:3189 contains a list with tweet id, tweet, date and language.

First of all I would like to know how to read the file test.json altogether without having to read lines, if that is possible.

Secondly, I would like to after completing step 1, how to parse all arrays together to R, rather than parsing them individually.

Third, instead of getting a list of lists (for 585 records: 585 lists of 2 lists each with each list having as many lists as the number of tweets) I would like to get a dataframe which I could flatten.

well that produces this error: Error: lexical error: invalid char in json text. test.json (right here) ------^ — user3516188, Nov 15 '17 at 13:51
Try to look at this answer https://stackoverflow.com/questions/26519455/error-parsing-json-file-with-the-jsonlite-package and this https://stackoverflow.com/questions/38858345/parse-error-trailing-garbage-while-trying-to-parse-json-column-in-data-frame — Barbara, Nov 15 '17 at 13:54

user3516188 · Answer 1 · 2017-11-16T12:29:35.033

0

Well I read the whole document with test_json =readLines("test.json") It has read all the user records for me

I converted each json record in R base using test_list = lapply(test_all,fromJSON)

edited Nov 16 '17 at 12:29

answered Nov 16 '17 at 11:56

user3516188

41
5

reading a json file with nested arrays at every line in R

1 Answers1