I have json file which contains 585 nested arrays. Each of the 585 arrays has the following structure:
[["12345678", [["12345678912345678", "dummy tweet #hashtag", "2015-05-20 15:33:11", "en"], ["12345678123456781", "dummy tweet again", "2015-05-18 22:08:30", "en"]]]
each array has a user id, and a corresponding array of arrays of that has user's tweets, in which each array has, tweet id, tweet, date and language.
I tried to stream the file with jsonlite stream in function as followstest_json = stream_in(file("test.json"))
but i get the following error:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 3189, 607, 219, 3120, 3091
closing file input connection.
I then read 10 lines test10_json =readLines("test.json", n =10)
which worked for me.
Now I wanted to read this json and tried fromJSON(test10_json)
but I get this error:
Error: parse error: trailing garbage
015-05-06 19:18:35", "en"]]] ["123456789", [["5101600420255
(right here) ------^
which means that when it tries to read the next array/record (the second row of the file, 2nd user's record) it gives an error.
However when I read each user record separately as follows test10_row1 = fromJSON(test10_json[1])
it reads them individually. and creates a list of List of 2
$ : chr "19211550"
$ : chr [1:3189, 1:4]
where each of 1:3189 contains a list with tweet id, tweet, date and language.
First of all I would like to know how to read the file test.json altogether without having to read lines, if that is possible.
Secondly, I would like to after completing step 1, how to parse all arrays together to R, rather than parsing them individually.
Third, instead of getting a list of lists (for 585 records: 585 lists of 2 lists each with each list having as many lists as the number of tweets) I would like to get a dataframe which I could flatten.