I imported some JSON data using rjson library. The problem I'm facing is that some of the data appears to be misaligned. I suspect this is due to missing values.
How can I detect and re-align the data that is in incorrect columns and fill empty values with NULL. I cannot share the data. I hope the image will be enough.
code used to import data:
library(rjson)
json_data <- do.call(rbind, lapply(readLines(training.file$filepaths[ind]), rjson::fromJSON))
json_data <- as.data.frame(json_data)
I have also tried using jsonlite::fromJSON function instead of rjson::fromJSON, but get the following error
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
parse error: trailing garbage
d_str": null, "place": null} {"truncated": false, "text": "R
(right here) ------^
json file format (data is manipulated but all properties are present in this example):
{
"truncated": false, "text": "abc abc", "in_reply_to_status_id": null,
"id": 123, "favorite_count": 0, "retweeted": false, "entities": {
"symbols": [], "user_mentions": [], "hashtags": [], "urls": []
},
"in_reply_to_screen_name": null, "id_str": "123", "retweet_count": 0,
"in_reply_to_user_id": null, "screen_name_statistics": {
"has_underscore": true, "contains_swear": false, "has_digits": false,
"contains_condition": false, "has_chars": true
},
"user": {
"verified": false, "geo_enabled": false, "followers_count": 0,
"utc_offset": -14400, "statuses_count": 17600, "friends_count": 4425,
"lang": "en", "favourites_count": 1900, "screen_name": "1name1",
"url": null, "created_at": "Sat Jun 00 03:36:27 +0000 2012",
"time_zone": "Atlantic Time (Canada)", "listed_count": 2
},
"geo": null, "in_reply_to_user_id_str": null, "lang": "en",
"created_at": "Mon Nov 55 05:18:49 +0000 2013",
"in_reply_to_status_id_str": null, "place": null
}
Further information:
obj1 and obj2 contain different number of properties obj1 contains 19 properties while obje contains 20 properties
misalignment occurs when list is converted to dataframe using as.data.frame. A custom function may be required to take property names into consideration.