-4

I imported some JSON data using rjson library. The problem I'm facing is that some of the data appears to be misaligned. I suspect this is due to missing values. How can I detect and re-align the data that is in incorrect columns and fill empty values with NULL. I cannot share the data. I hope the image will be enough. enter image description here

code used to import data:

library(rjson)
json_data <- do.call(rbind, lapply(readLines(training.file$filepaths[ind]), rjson::fromJSON))
json_data <- as.data.frame(json_data)

I have also tried using jsonlite::fromJSON function instead of rjson::fromJSON, but get the following error

 Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) : 
  parse error: trailing garbage
          d_str": null, "place": null} {"truncated": false, "text": "R
                     (right here) ------^

json file format (data is manipulated but all properties are present in this example):

{
    "truncated": false, "text": "abc abc", "in_reply_to_status_id": null,
     "id": 123, "favorite_count": 0, "retweeted": false, "entities": {
        "symbols": [], "user_mentions": [], "hashtags": [], "urls": []
        }, 
    "in_reply_to_screen_name": null, "id_str": "123", "retweet_count": 0,
    "in_reply_to_user_id": null, "screen_name_statistics": {
         "has_underscore": true, "contains_swear": false, "has_digits": false, 
        "contains_condition": false, "has_chars": true
    }, 
    "user": {
        "verified": false, "geo_enabled": false, "followers_count": 0,
        "utc_offset": -14400, "statuses_count": 17600, "friends_count": 4425, 
        "lang": "en", "favourites_count": 1900, "screen_name": "1name1",
        "url": null, "created_at": "Sat Jun 00 03:36:27 +0000 2012",
        "time_zone": "Atlantic Time (Canada)", "listed_count": 2
    }, 
    "geo": null, "in_reply_to_user_id_str": null, "lang": "en",
    "created_at": "Mon Nov 55 05:18:49 +0000 2013",
    "in_reply_to_status_id_str": null, "place": null
}

Further information:

obj1 and obj2 contain different number of properties obj1 contains 19 properties while obje contains 20 properties

misalignment occurs when list is converted to dataframe using as.data.frame. A custom function may be required to take property names into consideration.

zunman
  • 108
  • 1
  • 11
  • please post your JSON – HubertL Nov 18 '16 at 21:34
  • @HubertL added code snippet in question. This reads in all data from one file. File contains one flattened json object per line. – zunman Nov 19 '16 at 00:54
  • It's impossible to answer your question without seeing your JSON. However, I would recommend using `library(jsonlite)` instead. See [this answer](http://stackoverflow.com/a/37739735/5977215) for an example. – SymbolixAU Nov 19 '16 at 08:10
  • @SymbolixAU when I use jsonlite::fromJSON , I get the following error "Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) : parse error: trailing garbage d_str": null, "place": null} {"truncated": false, "text": "R (right here) ------^" – zunman Nov 19 '16 at 18:37
  • looks like you're reading in line by line, but there are missing commas between the lines. in which case, take a [look at this answer](http://stackoverflow.com/a/40451075/5977215) – SymbolixAU Nov 19 '16 at 20:17
  • the error you're getting from `jsonlite::fromJSON` is telling your there is something wrong with the bit: `"place": null} {"truncated": false,`. In json, different objets (i.e, those surrounded with `{ } `) need to be separated by commas. So it should read `"place": null}, {"truncated": false,`. – SymbolixAU Nov 20 '16 at 00:09
  • @SymbolixAU Thankyou. The jsonlite error is resolved using example you referred to. however I'm have having trouble converting the list to dataframe as the list contains vectors of different length. Once I can do this, I will be able to verify if jsonlite::fromJSON aligns my data correctly – zunman Nov 20 '16 at 01:59
  • Possible duplicate of [Error while trying to parse json into R](http://stackoverflow.com/questions/40448368/error-while-trying-to-parse-json-into-r) – SymbolixAU Nov 20 '16 at 02:32
  • Upon further investigation, I've come to realize the problem is not with fromJSON functions. While the data is in a list form, it appears to be correct. When list is converted to matrix or data.frame, that is where it gets misaligned. e.g. 1st json object, in first row of the file (given above as example) is missing a property "retweeted_status" which is present in 2nd json object. this causes the misalignment. I should maybe write a custom mapping function from list to dataframe – zunman Nov 20 '16 at 03:09

1 Answers1

0

I used rjson::fromJson function to import data. This was imported as a list which I then converted to dataframe for further analysis using as.data.frame.

At first I did not notice that the json objects had different number of properties which was causing misalignment of data in dataframe. column names were not matched.

To fix this, I wrote a custom mapping function which looks at individual values from list and maps them in a pre-defined dataframe.

Code specific to my example is available here . Specifically the "importJSON" function tackles the mapping of list to dataframe.

zunman
  • 108
  • 1
  • 11
  • 1
    Glad you got it working. Just FYI, to get good answers from StackOverflow you really need to [make a reproducible example](http://stackoverflow.com/q/5963269/5977215), including example data that other people can work with. – SymbolixAU Nov 20 '16 at 08:12