0

I have JSON file with events and logs inside those events, the example looks like this:

{
  "sessionEvents": [
    {
      "u": "BC0F6A3A2840B6F48386BABC5F34B480BA4F9929",
      "v": "0.1.0",
      "dv": "Unidentified",
      "t": 1462924115818,
      "uid": "",
      "len": 148012,
      "by": 0,
      "g": "U",
      "cy": "PH",
      "cr": "Unknown",
      "dm": "O+ Xfinit",
      "lat": 0.0,
      "lon": 0.0,
      "l": [
        {
          "e": "100_SESSION_START",
          "o": 24,
          "d": 147988,
          "p": {
            "User_Timezone": "-08:00",
            "Session_nb": "0",
            "Energy_Balance": "89",
            "Global_Playtime": "0",
            "Device_id": "75e64b654c01949",
            "Game_Language": "en",
            "Connection_Type": "WIFI",
            "User_Country": "US",
            "Push_Impact": "None"
          }
        },
        {
          "e": "008_TUTORIAL_STEP_OTHER",
          "o": 7561,
          "d": 0,
          "p": {
            "Screen_id": "scene_screen",
            "Misclicks": "0",
            "Tutorial_Step": "19",
            "Average_Time_Per_Frame": "0",
            "Total_Time": "0"
          }
        }
      ]
    },
    {
      "u": "C950FC733D883E11E36E15A705E05A3CC7748C3A",
      "v": "0.1.0",
      "dv": "OPPO Mirror 5",
      "t": 1462908916463,
      "uid": "",
      "len": 5368,
      "by": 0,
      "g": "U",
      "cy": "PH",
      "cr": "Unknown",
      "dm": "A51w",
      "lat": 0.0,
      "lon": 0.0,
      "l": [
        {
          "e": "100_SESSION_START",
          "o": 169,
          "d": 5199,
          "p": {
            "User_Timezone": "-08:00",
            "Session_nb": "0",
            "Energy_Balance": "0",
            "Global_Playtime": "0",
            "Device_id": "d0de71513e48fba",
            "Game_Language": "en",
            "Connection_Type": "WIFI",
            "User_Country": "US",
            "Push_Impact": "None"
          }
        }
      ]
    }
  ]
}

As you can see, there is a second level object "l" with logs of the event and third level "p" with parameters and it gives me a pain. I'm trying to convert it to data frame, but I only need "100_SESSION_START" logs' values in table (all parameter names of "l" and "p" names are the same for it), plus, I need to add all the parameters from higher level object - event ('u','v','dv','t'...). Does anyone have any idea how to do it user R?

upd: in a result it would e nice to have table like this click

andrew-zmeul
  • 121
  • 1
  • 1
  • 10
  • Can you give an example of your expected output? – Psidom May 13 '16 at 13:00
  • Possible duplicate of [Parse JSON with R](http://stackoverflow.com/questions/2061897/parse-json-with-r) – ArunK May 13 '16 at 13:40
  • @Psidom updated the post – andrew-zmeul May 13 '16 at 13:59
  • @Arun I know how to parse plain jsons, but jsons with different levels of depth is something new for me, maybe there are some functions or tricks there to connect lower level data to higher level (not manually). Sorry if I somewhat vague in explanation. – andrew-zmeul May 13 '16 at 14:09

2 Answers2

1

Assuming you have loaded the json file in a data variable

data <- fromJSON("/home/joel/tmp/input.json")

you can then iterate on each event and in each log of the events as you need:

n<-length(data$sessionEvents$u)
for (i in 1:n) { # Iterate over events
  print(data$sessionEvents$u[i])
  print(data$sessionEvents$v[i])
  print(data$sessionEvents$dv[i])
  print(data$sessionEvents$t[i])
  m<-length(data$sessionEvents$l[i][[1]]$e)
  for(j in 1:m){ # Iterate over logs
    print(data$sessionEvents$l[i][[1]]$e[j])
  }
}

Hope it helps.

joel314
  • 1,060
  • 1
  • 8
  • 22
0

You may do something like this using lapply.

topLevel <- c("u", "v", "dv", "t")
midLevel <- c("e", "o", "d")
botLevel <- c("User_Timezone", "Session_nb", "Energy_Balance", "Global_Playtime")

do.call(rbind, lapply(li[[1]], function(x) {
    do.call(rbind, lapply(x$l, function(y) {
        if(y$e == "100_SESSION_START") {
            c(y[midLevel], y$p[botLevel], x[topLevel])
        }
    }))
}))

     e                   o   d      User_Timezone Session_nb Energy_Balance Global_Playtime
[1,] "100_SESSION_START" 24  147988 "-08:00"      "0"        "89"           "0"            
[2,] "100_SESSION_START" 169 5199   "-08:00"      "0"        "0"            "0"            
     u                                          v       dv              t           
[1,] "BC0F6A3A2840B6F48386BABC5F34B480BA4F9929" "0.1.0" "Unidentified"  1.462924e+12
[2,] "C950FC733D883E11E36E15A705E05A3CC7748C3A" "0.1.0" "OPPO Mirror 5" 1.462909e+12
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • thanks for the your answer! I tried to run it but have an error "Error in y$e : $ operator is invalid for atomic vectors", why it may happen? Im easily getting the value with such command 'doc$sessionEvents$l[i][[1]]$p$User_Timezone[[1]]' – andrew-zmeul May 17 '16 at 17:27
  • I am not exactly sure. Literally, it means `y` which should be a list is a vector here. And it is possible that your actual data has some elements where the `l` element is empty. – Psidom May 17 '16 at 22:40
  • Yep. I did use `fromJSON` from `rjson` package. – Psidom May 19 '16 at 12:43