Extracting values of a certain date from JSON in R Dataframe

Question

I have a below mentioned dataframe:

ID      Rank    Name       Json_Data
IR-122  RE      AFG        {as below sample}
IR-122  UI      SSw        {as below sample}
IR-123  RF      HEr        {as below sample}
IR-123  RO      djf        {as below sample}
IR-124  RE      der        {as below sample}
IR-125  RF      fet        {as below sample}

Sample Json Data:

{"Jan-2018":{"10":50000.0,"11":50000.0,"12":15202.0,"13":10089.0,"14":10089.0,"15":9589.0,"16":9589.0,"17":18941.0,"18":15246.75,"19":5053.75,"20":44092.75,"21":36630.75,"22":9334.75,"23":5254.75,"24":4357.25,"25":3357.25,"26":44626.25,"27":49292.25,"28":48292.25,"29":43371.8,"30":38675.8,"31":37988.12},"Mar-2018":{"1":30799.02,"2":20775.42,"3":20657.42,"4":20657.42,"5":12657.42,"6":11110.22,"7":11110.22,"8":11110.22,"9":11111.22,"10":30272.22,"11":30272.22,"12":25316.22,"13":25316.22,"14":25316.22,"15":25316.22,"16":25316.22,"17":25316.22,"18":25316.22,"19":25316.22,"20":25316.22,"21":15316.22,"22":15316.22,"23":15316.22,"24":15316.22,"25":15204.12,"26":14791.12,"27":14791.12,"28":14791.12,"29":14791.12,"30":14791.12,"31":14791.12},"Feb-2018":{"1":36749.12,"2":36483.37,"3":35254.87,"4":27254.87,"5":15880.87,"6":14173.87,"7":7934.87,"8":7091.87,"9":5797.87,"10":5797.87,"11":5797.87,"12":283841.87,"13":283418.87,"14":283418.87,"15":253426.37,"16":242226.37,"17":227226.37,"18":197226.37,"19":147226.37,"20":111799.02,"21":111799.02,"22":66799.02,"23":64799.02,"24":64799.02,"25":63799.02,"26":53799.02,"27":36799.02,"28":36799.02},"Apr-2018":{"1":14791.12,"2":14791.12,"3":14791.12,"4":14791.12,"5":10791.12,"6":10791.12,"7":10791.12,"8":10791.12,"9":10755.72,"10":5799.72,"11":5799.72,"12":5799.72,"13":5799.72,"14":5799.72,"15":5799.72,"16":5799.72,"17":5799.72,"18":5799.72,"19":5728.92,"26":728.92,"27":728.92,"28":728.92,"29":728.92,"30":728.92}}

My pattern json carries Month and studying of each date of that month, by using the above mentioned dateframe i want to extract :

Maximum month (in MMM-YY) format in one column
Values of 1st, 7th, 14th, 21st and 28th date

A_1 value in below mentioned sample output would be value of Apr-2018 on 1st : 14791.12

A_2 value of would be for Apr-2018 on 7th : 10791.12 so on and so forth.

I need this from 4 month from the maximum month (excluding maximum month).

Below A_1 is actual previous month of most month and A_2 is precise preceding month of A_1 and so on, and I have supply solely from A_1 to A_28 the identical columns will repeat for the B_2, C_3 and C_4 month.

Where in A_1 the analyzing would be first date of A_1 month, in A_7 reading of 7th day of A_1 month and so on for the other three months as well. And the values need to be group_by ID and Rank.

In my sample Json data, there are only 4 month and the Maximum month is Apr-2018, so in that case A_1 would be Mar-2018, B_2 would be Feb-2018, C_3 would be Jan-2018 and D_4 would be Dec-2017 (where D_1,D_7_D_14,D_21 and D_28 would be NA).

In below pattern I have expect that the Max month is May-18.

Sample Output:

    ID     Rank     Name    Max_Month  A_1      B_2      C_3      D_4      A_1       A_7       A_14       A_21      A_28     B_1      B_7     B_14      B_21   B_28        C_1   C_7   C_14   C_21     C_28      D_1   D_7   D_14   D_21   D_28
    IR-122 RE       AFG     Apr-2018   Mar-2018 Feb-2018 Jan-2018 Dec-2017 30799.02  11110.22  25316.22   16316.22  14791.12 36749.12 7934.87 283418.87 111799.02 36799.02  NA   NA    10089  36630.75 48292.25   NA   NA     NA      NA   NA

Sample dput dataframe:

structure(list(ID = "IR-122", Rank = "RE", Name = "AFG", Json_Data = "{\"Jan-2018\":{\"10\":50000.0,\"11\":50000.0,\"12\":15202.0,\"13\":10089.0,\"14\":10089.0,\"15\":9589.0,\"16\":9589.0,\"17\":18941.0,\"18\":15246.75,\"19\":5053.75,\"20\":44092.75,\"21\":36630.75,\"22\":9334.75,\"23\":5254.75,\"24\":4357.25,\"25\":3357.25,\"26\":44626.25,\"27\":49292.25,\"28\":48292.25,\"29\":43371.8,\"30\":38675.8,\"31\":37988.12},\"Mar-2018\":{\"1\":30799.02,\"2\":20775.42,\"3\":20657.42,\"4\":20657.42,\"5\":12657.42,\"6\":11110.22,\"7\":11110.22,\"8\":11110.22,\"9\":11111.22,\"10\":30272.22,\"11\":30272.22,\"12\":25316.22,\"13\":25316.22,\"14\":25316.22,\"15\":25316.22,\"16\":25316.22,\"17\":25316.22,\"18\":25316.22,\"19\":25316.22,\"20\":25316.22,\"21\":15316.22,\"22\":15316.22,\"23\":15316.22,\"24\":15316.22,\"25\":15204.12,\"26\":14791.12,\"27\":14791.12,\"28\":14791.12,\"29\":14791.12,\"30\":14791.12,\"31\":14791.12},\"Feb-2018\":{\"1\":36749.12,\"2\":36483.37,\"3\":35254.87,\"4\":27254.87,\"5\":15880.87,\"6\":14173.87,\"7\":7934.87,\"8\":7091.87,\"9\":5797.87,\"10\":5797.87,\"11\":5797.87,\"12\":283841.87,\"13\":283418.87,\"14\":283418.87,\"15\":253426.37,\"16\":242226.37,\"17\":227226.37,\"18\":197226.37,\"19\":147226.37,\"20\":111799.02,\"21\":111799.02,\"22\":66799.02,\"23\":64799.02,\"24\":64799.02,\"25\":63799.02,\"26\":53799.02,\"27\":36799.02,\"28\":36799.02},\"Apr-2018\":{\"1\":14791.12,\"2\":14791.12,\"3\":14791.12,\"4\":14791.12,\"5\":10791.12,\"6\":10791.12,\"7\":10791.12,\"8\":10791.12,\"9\":10755.72,\"10\":5799.72,\"11\":5799.72,\"12\":5799.72,\"13\":5799.72,\"14\":5799.72,\"15\":5799.72,\"16\":5799.72,\"17\":5799.72,\"18\":5799.72,\"19\":5728.92,\"26\":728.92,\"27\":728.92,\"28\":728.92,\"29\":728.92,\"30\":728.92}}"), class = "data.frame", row.names = c(NA, 
-1L))

@Moody_Mudskipper ok thanks, please let me know in case of if anything confusing. — Vector JX, Jul 08 '18 at 15:13
well, it's all very confusing to be honest :). you talk about an A1 value of 14791.12 that I see in the data, but i don't see it in the output, and in the output you have several columns named the same, is it normal ? — moodymudskipper, Jul 08 '18 at 15:14
@Moody_Mudskipper The output is just for the idea all the values are random there, well we can change `A_1` to `D_4` as `Month_1` to `Month_4` to handle duplication. — Vector JX, Jul 08 '18 at 15:17
The problem is that the idea isn't very clear, I encourage you to edit your expected output or/and input so that they are 100% consistent, we can often get from the example what we miss from the explanation. — moodymudskipper, Jul 08 '18 at 15:19
Do you always have 4 months, or sometimes you have more and you just want the 4 last months ? — moodymudskipper, Jul 08 '18 at 15:23
In your example, the month of May is not mentionned, it goes up to April, so what is max_month ? April ? then we have only 3 additional months. Try to sort all these things out and ping me when you have a clear reproducible example. — moodymudskipper, Jul 08 '18 at 15:27
@Moody_Mudskipper : Changed the required output, I just want to first get the Maximum month from the Json_Data for each `ID` and than 1,7,14,21 and 28 date value of Last 4 month, If any month is missing in json (let suppose we have only 4 month in json and we exclude maximum month and left with only 3 month than last) than last month value (1,7,14,21 and 28) would be `NA`. — Vector JX, Jul 08 '18 at 15:27

moodymudskipper · Accepted Answer · 2018-07-08T18:46:59.620

3

json_to_df <- function(data){
  json_as_list <- jsonlite::fromJSON(data)
  months       <- names(json_as_list)
  last4months  <- tail(months[order(lubridate::myd(paste0(months,"-01")))],4)
  max_month    <- tail(last4months,1)
  other_months <- head(last4months,-1)
  other_months_suffixes <- paste0(LETTERS[seq_along(other_months)],"_")
  last_month   <- tail(other_months,1)
  days         <- c('1','7','14','21','28')
  get_month_list <- function(x) json_as_list[[x]][days]
  list_subset <- Map(function(x,y) setNames(get_month_list(x),paste0(y,days)),
                     rev(other_months), other_months_suffixes)
  list_subset <- unlist(list_subset, recursive = FALSE)
  names(list_subset) <- gsub("^.*?\\.","",names(list_subset))
  list_subset <- map_if(list_subset, is.null,~NA)

  only_nas <- setNames(replicate(20,NA,F),paste(sep="_",rep(LETTERS[1:4],each=5),rep(days,4)))
  missing <- names(only_nas)[! names(only_nas) %in% names(list_subset)]
  list_subset <- c(list_subset, only_nas[missing])


  list_months <- setNames(as.list(other_months),paste0(other_months_suffixes,0))
  only_nas2 <- setNames(replicate(4,NA,F),paste(sep="_",LETTERS[1:4],0))
  missing2 <- names(only_nas2)[! names(only_nas2) %in% names(list_months)]
  list_months <- c(list_months, only_nas2[missing2])

  output_list <- c(
    Max_Month = max_month,
    list_months,
    list_subset)
  data.frame(output_list)
}

library(jsonlite)
library(lubridate)
library(tidyverse)
df %>%
  mutate(Json_Data = map(Json_Data,json_to_df)) %>%
  unnest
#       ID Rank Name Max_Month      A_0      B_0      C_0 D_0      A_1      A_7     A_14     A_21     A_28      B_1
# 1 IR-122   RE  AFG  Apr-2018 Jan-2018 Feb-2018 Mar-2018  NA 30799.02 11110.22 25316.22 15316.22 14791.12 36749.12
#       B_7     B_14   B_21     B_28 C_1 C_7  C_14     C_21     C_28 D_1 D_7 D_14 D_21 D_28
# 1 7934.87 283418.9 111799 36799.02  NA  NA 10089 36630.75 48292.25  NA  NA   NA   NA   NA

edited Jul 08 '18 at 18:46

answered Jul 08 '18 at 16:01

moodymudskipper

46,417
11
121
167

`D_0` is missing and `B_1`,`B_7`,`B_14`,`B_21` and `B_28` same for month `C_0` and `D_0`. – Vector JX Jul 08 '18 at 16:04
There's no December in your input, and there's no B_1 B_7 etc in your expected output, that's why I specifically asked you to edit your post... I'll have to leave you with this, you should be able to work it out. But next time really make your example reproducible it's much more pleasant for answerers and you'll get higher quality answers. – moodymudskipper Jul 08 '18 at 16:08
Getting error `Error in eval(substitute(expr), envir, enclos) : arguments imply differing number of rows: 1, 0` – Vector JX Jul 08 '18 at 16:08
For the sake of simplicity i didn't mentioned B_,C_ and D_ values in expected sample output. please help me to resolve this. – Vector JX Jul 08 '18 at 16:10
If you read this and rework your post, many people will be happy to help: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – moodymudskipper Jul 08 '18 at 16:14
I am completely agree with you, and I'll change the expected output. – Vector JX Jul 08 '18 at 16:16
Made all the changes in the expected output...Please help me out here. – Vector JX Jul 08 '18 at 16:30
Json_Data length could be vary for different Ids. – Vector JX Jul 08 '18 at 16:37
Showing missing month would be tricky (i.e Dec-2017 in this case), we can keep value for any missing month as `NA` as well. – Vector JX Jul 08 '18 at 16:48
Removed the garbage json from the dataframe and now its working fine, Please suggest for `D_0` and other components. – Vector JX Jul 08 '18 at 18:02
1

Got the `D_0` vaule :) Please suggest me where to change in function for `B_`,`C_` and `D_` Values, Please. – Vector JX Jul 08 '18 at 18:24
All `D_0` are showing `NA` where as in my original dataframe i have `D_0` values. changed 4 to 5 in this line of code: `tail(months[order(lubridate::myd(paste0(months,"-01")))],5)` – Vector JX Jul 08 '18 at 18:52
can you update your dput by giving me 2 or 3 rows, including some that are problematic ? – moodymudskipper Jul 08 '18 at 18:54
you shouldn't change 4 to 5, we need 4 months maximum – moodymudskipper Jul 08 '18 at 18:55
Just want to understand how can i get c('1','7','14','21','28') value for max_month as well, by changing which line. – Vector JX Jul 10 '18 at 11:18
I think the easiest would be to remove all which is about max_month and use `last4months` instead of `other_months` + some tweaks on number of reps. If you set set `data <- df$Json_Data[1]` and explore the function line by line you should be able to see how each step works. – moodymudskipper Jul 10 '18 at 11:30
You're right about the replacement, sorry. I made a mistake, you wanted 4 "other months", so the vector should have been called `last5months` and be defined with `5` instead of `4` as you did. – moodymudskipper Jul 10 '18 at 11:31
Thanks but I didn't get your previous point, how would I get max_months values (1,7,14,21&28), I am trying but couldn't do it..:( – Vector JX Jul 10 '18 at 11:33

Extracting values of a certain date from JSON in R Dataframe

1 Answers1