2

I am very new to Json files. I scraped a txt file with some million json objects such as:

{
    "created_at":"Mon Oct 14 21:04:25 +0000 2013",
    "default_profile":true,
    "default_profile_image":true,
    "description":"...",
    "followers_count":5,
    "friends_count":560,
    "geo_enabled":true,
    "id":1961287134,
    "lang":"de",
    "name":"Peter Schmitz",
    "profile_background_color":"C0DEED",
    "profile_background_image_url":"http://abs.twimg.com/images/themes", 
    "utc_offset":-28800,
    ...
}
{
    "created_at":"Fri Oct 17 20:04:25 +0000 2015",
    ...
}

I want to extract the columns into a data frame in R:

Variable          Value
created_at          X     
default_profile     Y     

 …

In general, similar to how done here(multiple Json objects in one file extract by python) in Python. If anyone has an idea or a suggestion, help would be much appreciated! Thank you!

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
TSpinde
  • 35
  • 6
  • Here's possible solution: https://stackoverflow.com/questions/27023972/multiple-json-objects-in-fromjson – Biranjan Feb 23 '18 at 11:52

1 Answers1

2

Here is an example on how you could approach it with two objects. I assume you were able to read the JSON from a file, otherwise see here.

myjson = '{"created_at": "Mon Oct 14 21:04:25 +0000 2013", "default_profile": true, 
  "default_profile_image": true, "description": "...", "followers_count": 
    5, "friends_count": 560, "geo_enabled": true, "id": 1961287134, "lang":  
    "de", "name": "Peter Schmitz", "profile_background_color": "C0DEED",  
  "profile_background_image_url": "http://abs.twimg.com/images/themes", "utc_offset": -28800}
{"created_at": "Mon Oct 15 21:04:25 +0000 2013", "default_profile": true, 
  "default_profile_image": true, "description": "...", "followers_count": 
    5, "friends_count": 560, "geo_enabled": true, "id": 1961287134, "lang":  
    "de", "name": "Peter Schmitz", "profile_background_color": "C0DEED",  
  "profile_background_image_url": "http://abs.twimg.com/images/themes", "utc_offset": -28800}
'

library("rjson")

# Split the text into a list of all JSON objects. I chose '!x!x!' pretty randomly.. There may be better ways of keeping the brackets wile splitting.
my_json_objects = head(strsplit(gsub('\\}','\\}!x!x!', myjson),'!x!x!')[[1]],-1)
# read the text as JSON objects 
json_data <- lapply(my_json_objects, function(x) {fromJSON(x)})
# Transform to dataframes
json_data <- lapply(json_data, function(x) {data.frame(val=unlist(x))}) 

Output:

[[1]]
                                                            val
created_at                       Mon Oct 14 21:04:25 +0000 2013
default_profile                                            TRUE
default_profile_image                                      TRUE
description                                                 ...
followers_count                                               5
friends_count                                               560
geo_enabled                                                TRUE
id                                                   1961287134
lang                                                         de
name                                              Peter Schmitz
profile_background_color                                 C0DEED
profile_background_image_url http://abs.twimg.com/images/themes
utc_offset                                               -28800

[[2]]
                                                            val
created_at                       Mon Oct 15 21:04:25 +0000 2013
default_profile                                            TRUE
default_profile_image                                      TRUE
description                                                 ...
followers_count                                               5
friends_count                                               560
geo_enabled                                                TRUE
id                                                   1961287134
lang                                                         de
name                                              Peter Schmitz
profile_background_color                                 C0DEED
profile_background_image_url http://abs.twimg.com/images/themes
utc_offset                                               -28800

Hope this helps!

Florian
  • 24,425
  • 4
  • 49
  • 80
  • OP is *specifically* asking about files with multiple JSON objects in them (i.e. invalid JSON files). – Konrad Rudolph Feb 23 '18 at 11:42
  • @KonradRudolph thanks for the feedback. I am kind of new to JSON so I was not aware that that was the issue. I have updated my answer, do you think this better answers OP's question? – Florian Feb 23 '18 at 11:52
  • Thanks a lot, working perfectly! Just to add what might be helpful: If you wanna use this as a full table, use `json_data <- lapply(json_data,t); df <- plyr::ldply(json_data, rbind)` – TSpinde Feb 24 '18 at 19:18
  • @TSpinde I think you can even skip the transpose step if you replace the last line of my code with `json_data <- lapply(json_data, function(x) {data.frame(x)})`. Anyway, glad I could help and nice that you were able to make the modifications to make it work for you! – Florian Feb 24 '18 at 19:55