0

I've json like below, which i got from below URL:

{
   "info" : {
      "1484121600" : [
         212953175.053333,212953175.053333,null
      ],
      "1484125200" : [
         236203014.133333,236203014.133333,236203014.133333
      ],
      "1484128800" : [
         211414832.968889,null,211414832.968889
      ],
      "1484132400" : [
         208604573.791111,208604573.791111,208604573.791111
      ],
      "1484136000" : [
         231358374.288889,231358374.288889,231358374.288889
      ],
      "1484139600" : [
         210529301.097778,210529301.097778,210529301.097778
      ],
      "1484143200" : [
         212009682.04,null,212009682.04
      ],
      "1484146800" : [
         232364759.566667,232364759.566667,232364759.566667
      ],
      "1484150400" : [
         218138788.524444,218138788.524444,218138788.524444
      ],
      "1484154000" : [
         218883301.282222,218883301.282222,null
      ],
      "1484157600" : [
         237874583.771111,237874583.771111,237874583.771111
      ],
      "1484161200" : [
         216227081.924444,null,216227081.924444
      ],
      "1484164800" : [
         227102054.082222,227102054.082222,null
      ]
},
"summary" : "data",
"end" : 1484164800,
"start": 1484121600
}

I'm fetching this json from some url using jsonlite package in R like below:

library(jsonlite)

input_data <- fromJSON(url)

timeseries <- input_data[['info']]       # till here code is fine

abc <- data.frame(ds = names(timeseries[[1]]), 
                  y = unlist(timeseries[[1]]), stringsAsFactors = FALSE)

(something is wrong in above line)

I need to convert this data in timeseries variable into data frame; which will have index column as the epoch time and no. of columns in dataframe will depend upon no. of values in array and all arrays will have same no. of values for sure. But no. of values in array can be 1 0r 2 or etc; it is not fixed. Like in below example array size is 3 for all.

for eg : dataframe should look like:

index        y1                     y2                    y3
1484121600   212953175.053333       212953175.053333      null
1484125200   236203014.133333       236203014.133333   236203014.133333

Please suggest how do I do this in R. I'm new to it.

JSON with only 1 item in array:

{
"info": {
"1484121600": [
212953175.053333
],
"1484125200": [
236203014.133333
],
"1484128800": [
211414832.968889
],
"1484132400": [
208604573.791111
],
"1484136000": [
231358374.288889
],
"1484139600": [
210529301.097778
],
"1484143200": [
212009682.04
],
"1484146800": [
232364759.566667
],
"1484150400": [
218138788.524444
],
"1484154000": [
218883301.282222
],
"1484157600": [
237874583.771111
],
"1484161200": [
216227081.924444
],
"1484164800": [
227102054.082222
]
},
"summary": "data",
"end": 1484164800,
"start": 1484121600
}
Ashag
  • 837
  • 2
  • 15
  • 24

1 Answers1

2

Consider binding the list of json values to a matrix with sapply(), then transpose columns to rows with t(), and finally convert to dataframe with data.frame()

abc <- data.frame(t(sapply(timeseries, c)))
colnames(abc) <- gsub("X", "y", colnames(abc))

abc
#                   y1        y2        y3
# 1484121600 212953175 212953175        NA
# 1484125200 236203014 236203014 236203014
# 1484128800 211414833        NA 211414833
# 1484132400 208604574 208604574 208604574
# 1484136000 231358374 231358374 231358374
# 1484139600 210529301 210529301 210529301
# 1484143200 212009682        NA 212009682
# 1484146800 232364760 232364760 232364760
# 1484150400 218138789 218138789 218138789
# 1484154000 218883301 218883301        NA
# 1484157600 237874584 237874584 237874584
# 1484161200 216227082        NA 216227082
# 1484164800 227102054 227102054        NA
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thanks for the response. But I don't want to ignore values after decimal. – Ashag Jul 10 '17 at 19:30
  • Very interesting! I hadn't noticed that. That's a json parse issue. Even more interesting is R removes it, possibly the number of digits on both sides of decimal are too many: `as.numeric("212953175.053333")` yields: `[1] 212953175`. – Parfait Jul 10 '17 at 19:48
  • alright, can we take log of these values and then store into dataframe? – Ashag Jul 10 '17 at 19:57
  • but log should be taken of complete number ("212953175.053333") . Is that possible? – Ashag Jul 10 '17 at 19:58
  • Also, your code doesn't seem to work if array has only 1 value.For eg: I have added a json in question which has array with only one value. – Ashag Jul 10 '17 at 19:58
  • Actually turns out the decimals are just not showing. See the check on first row and first column of *abc* dataframe: `abc$y1[1] == 212953175.053333` yields: `[1] TRUE` and the whole number fails `abc$y1[1] == 212953175` yields `[1] FALSE`. – Parfait Jul 10 '17 at 20:03
  • ohh yeah, correct! But, this code doesn't seem to work when array has only 1 value – Ashag Jul 10 '17 at 20:06
  • Use the simpler for that one-value case: `abc <- data.frame(y1=sapply(timeseries, c))` – Parfait Jul 10 '17 at 20:08
  • yes that is the way but I have to check for the length first and then form the dataframe. – Ashag Jul 10 '17 at 20:10
  • This was very important item to have included at beginning. Does all of json include only one-values or a mix? You can use `if` to check the length: `if( length(timeseries[[1]])==1) { ... go one on-value parse } else { ... go multi-value parse }` – Parfait Jul 10 '17 at 20:23
  • Actually I have one parameter in json which I can use to determine length. I have put if condition based upon that. – Ashag Jul 10 '17 at 20:27
  • Just want to know one more thing: which line in code is making epoch time as index column and if I do'nt want to make that as an index column, how do i do that and make an index column as sequence of 1 2 3 4 etc – Ashag Jul 10 '17 at 20:29
  • There's no index in R. That's Python pandas. Those are *rownames(df)* derived from the json names. To assign values to a column, simply `df$newcol <- rownames(df)` and to reset `rownames(df) <- NULL`. But do first before latter. – Parfait Jul 10 '17 at 20:42
  • Excellent. Thanks very much. I'm new to R and kind of think everything in Pandas perspective. Will learn soon. – Ashag Jul 10 '17 at 20:46
  • Please help me with this one small thing. I'm trying to convert row names epoch time to date time by doing something like : row.names(df) <-as.POSIXct(row.names(df),format="%Y-%m-%d %H:%M:%S") But it converts it into NA. Can you tell something on this. When I checked type of row.names , it is character. So how do i convert character epoch time to datetime? – Ashag Jul 10 '17 at 21:17
  • Your *rownames* are stored in seconds from UNIX epoch (01/01/1970). See this [post](https://stackoverflow.com/questions/13456241/convert-unix-epoch-to-date-object-in-r). Try this conversion: `row.names(abc) <- as.POSIXct(as.integer(row.names(abc)), origin="1970-01-01")` – Parfait Jul 10 '17 at 21:49
  • okay, so I converted row names to datetime using row.names(df) <- as.POSIXct(as.integer(row.names(df)), origin="1970-01-01") , but when I do print(typeof(row.names(df))) it still says character, it should be datetime right? – Ashag Jul 10 '17 at 23:37
  • Rownames can only be character. Per docs, `rownames()` and `colnames()` coerce with `as.character()`. Still thinking in pandas! Do note, R has the time series object. See `?ts()` or look into [zoo](https://www.rdocumentation.org/packages/zoo/versions/1.8-0/topics/zoo) package or its extension, [xts](https://www.rdocumentation.org/packages/xts/versions/0.9-7). – Parfait Jul 11 '17 at 01:46
  • Just clear my one doubt, I have now my dataframe like below: row.names(df) <- anytime(as.numeric(row.names(df))) ;df$y <- log(df$y) ; Now in order to convert it into timeseries dataframe (as I'm working on ARIMA) using : df_ts <- ts(df) . When I print this dataframe I have all number's like 1, 2 , 3 in row.names of df_ts and Y column has all the values. There is nothing called as timestamp here. Just confirm after applying ts() the dataframe should look like what I'm getting or there should be time also in some column. – Ashag Jul 11 '17 at 01:56
  • Look into the other packages. `ts()` requires sequential, equal length time with *start* and *end* by a *delta* while the others allow irregular time intervals with timestamp index. – Parfait Jul 11 '17 at 02:40
  • can you please check this : https://stackoverflow.com/questions/45052353/convert-unix-epoch-time-to-datetime-as-taken-by-timeseries-ts-method-in-r – Ashag Jul 12 '17 at 08:32
  • See this answer on relevant question: https://stackoverflow.com/a/33714818/1422451 – Parfait Jul 12 '17 at 12:36
  • I have seen it. But my timeseries is regular. I have no problem of irregularity or something. Can you again look my question once if possible. Thanks – Ashag Jul 12 '17 at 17:18
  • But *zoo* objects maintain an index attribute for datetime which *ts* objects do not. – Parfait Jul 12 '17 at 18:02
  • I think I understood you what you meant to say. I should use xts or zoo. Also, I found a link https://stats.stackexchange.com/questions/64383/what-is-the-best-forecasting-model-for-time-series-data where the data is of form epoch time and then they have converted into ts object after applying xts. In my case, my dataframe have row.names as epoch time and values in one column. So, I think i need to convert it to into xts object (where it will convert epoch to datetime) then convert to ts() as understood by arima. Can you tell how do i convertmy dataframe using xts or zoo? – Ashag Jul 12 '17 at 19:54